Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning (2410.18879v2)

Published 24 Oct 2024 in cs.CV

Abstract: This report outlines Team Seq2Cure's deep learning approach for the Capsule Vision 2024 Challenge, leveraging an ensemble of convolutional neural networks (CNNs) and transformer-based architectures for multi-class abnormality classification in video capsule endoscopy frames. The dataset comprised over 50,000 frames from three public sources and one private dataset, labeled across 10 abnormality classes. To overcome the limitations of traditional CNNs in capturing global context, we integrated CNN and transformer models within a multi-model ensemble. Our approach achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC score of 0.9908 on the validation set, earning our submission 5th place in the challenge. Code is available at http://github.com/arnavs04/capsule-vision-2024 .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Capsule vision 2024 challenge: Multi-class abnormality classification for video capsule endoscopy. arXiv preprint arXiv:2408.04940, 2024a.
  2. Real-time small bowel disease detection and classification using capsule endoscopy with deep learning: A multi-center study. Gastrointestinal Endoscopy, 91(6):AB301, 2020. doi: 10.1016/j.gie.2020.03.1918.
  3. Deep learning for small bowel capsule endoscopy: A systematic review and meta-analysis. Gastrointestinal Endoscopy, 90(4):668–679, 2019. doi: 10.1016/j.gie.2019.06.018.
  4. Artificial intelligence system for detection and classification of intestinal ulcers in capsule endoscopy. Digestive Diseases and Sciences, 66(8):2714–2721, 2021. doi: 10.1007/s10620-020-06634-3.
  5. Training and validation dataset of capsule vision 2024 challenge. Figshare, 7 2024b. doi: 10.6084/m9.figshare.26403469.v1. URL https://figshare.com/articles/dataset/Training_and_Validation_Dataset_of_Capsule_Vision_2024_Challenge/26403469.
  6. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 6105–6114. PMLR, 2019.
  7. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  8. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1314–1324, 2019.
  9. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10415–10424, 2020.
  10. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
  11. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  12. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1492–1500, 2017.
  13. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.
  14. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2820–2828, 2019.
  15. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7132–7141, 2018.
  16. Convnext: Revisiting convolutions for vision. arXiv preprint arXiv:2201.03545, 2022.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  18. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
  19. Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877, 2021.
  20. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.09785, 2021.
  21. Twins: Revisiting the design of spatial attention in vision transformers. arXiv preprint arXiv:2104.13840, 2021.
  22. Efficientformer: Vision transformers in the real world. arXiv preprint arXiv:2106.13319, 2021.
  23. Lutz Prechelt. Automatic early stopping using cross-validation: quantifying the criteria. In Neural Networks: Tricks of the trade, pages 55–69. Springer, 1998.
  24. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2019.
  25. What is the effect of importance weighting in deep learning? In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 872–881. PMLR, 2019.
  26. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.