Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy (2401.06278v2)

Published 11 Jan 2024 in cs.CV and cs.LG

Abstract: Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained in a supervised manner with ImageNet-1k as backbones. However, the use of modern self-supervised pretraining algorithms and a recent dataset of 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks. In addition to identifying the most suitable pretraining pipeline and backbone architecture for each task, out of those considered, our results suggest three general principles. Firstly, that self-supervised pretraining generally produces more suitable backbones for GIE vision tasks than supervised pretraining. Secondly, that self-supervised pretraining with ImageNet-1k is typically more suitable than pretraining with Hyperkvasir-unlabelled, with the notable exception of monocular depth estimation in colonoscopy. Thirdly, that ViT-Bs are more suitable in polyp segmentation and monocular depth estimation in colonoscopy, ResNet50s are more suitable in polyp detection, and both architectures perform similarly in anatomical landmark recognition and pathological finding characterisation. We hope this work draws attention to the complexity of pretraining for GIE vision tasks, informs this development of more suitable approaches than the convention, and inspires further research on this topic to help advance this development. Code available: \underline{github.com/ESandML/SSL4GIE}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Artificial intelligence for upper gastrointestinal endoscopy: A roadmap from technology development to clinical practice. Diagnostics, 12(5), 2022.
  2. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  3. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  4. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  5. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12104–12113, 2022.
  6. Public imaging datasets of gastrointestinal endoscopy for artificial intelligence: a review. Journal of Digital Imaging, pages 1–24, 2023.
  7. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pages 843–852, 2017.
  8. What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614, 2016.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  10. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  11. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  12. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  13. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  14. An empirical study of training self-supervised vision transformers. CoRR, abs/2104.02057, 2021.
  15. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  16. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
  17. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  18. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  19. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
  20. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine, 6(1):74, 2023.
  21. Self-supervised learning from 100 million medical images. arXiv preprint arXiv:2201.01283, 2022.
  22. Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3478–3488, 2021.
  23. Self-supervised learning for medical image analysis using image context restoration. Medical image analysis, 58:101539, 2019.
  24. Self-supervised pretraining for 2d medical image segmentation. In European Conference on Computer Vision, pages 472–484. Springer, 2022.
  25. Jiashu Xu. A review of self-supervised learning methods in the field of medical image analysis. International Journal of Image, Graphics and Signal Processing (IJIGSP), 13(4):33–46, 2021.
  26. Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8:e1045, 2022.
  27. Dissecting self-supervised learning methods for surgical computer vision. Medical Image Analysis, 88:102844, 2023.
  28. Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific data, 7(1):283, 2020.
  29. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  30. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  31. Effective representation learning via the integrated self-supervised pre-training models of stylegan2-ada and dino for colonoscopy images. BioRxiv, pages 2022–06, 2022.
  32. Self-supervised visual feature learning for polyp segmentation in colonoscopy images using image reconstruction as pretext task. In 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), pages 254–259. IEEE, 2021.
  33. Nguyen Chi Thanh. Colonoscopy image classification using self-supervised visual feature learning. Journal of Military Science and Technology, (CSCE5):3–13, 2021.
  34. Self-supervised learning for endoscopic video analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 569–578. Springer, 2023.
  35. Improving image classification of gastrointestinal endoscopy using curriculum self-supervised learning. PREPRINT (Version 1) available at Research Square, 2023.
  36. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  37. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  38. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  39. Peco: Perceptual codebook for bert pre-training of vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 552–560, 2023.
  40. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
  41. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  42. Macro f1 and macro f1. arXiv preprint arXiv:1911.03347, 2019.
  43. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11410–11420, 2022.
  44. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  45. Kvasir-seg: A segmented polyp dataset. In MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, pages 451–462. Springer, 2020.
  46. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
  47. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
  48. Benchmarking detection transfer learning with vision transformers. arXiv preprint arXiv:2111.11429, 2021.
  49. Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision, pages 280–296. Springer, 2022.
  50. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  51. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics, 43:99–111, 2015.
  52. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018.
  53. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  54. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3, pages 240–248. Springer, 2017.
  55. Colonoscopy 3d video dataset with paired depth from 2d-3d registration. Medical Image Analysis, 90:102956, 2023.
  56. Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters, 15(5):749–753, 2018.
  57. Resunet++: An advanced architecture for medical image segmentation. In 2019 IEEE international symposium on multimedia (ISM), pages 225–2255. IEEE, 2019.
  58. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020.
  59. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018.
  60. Simcol3d–3d reconstruction during colonoscopy challenge. arXiv preprint arXiv:2307.11261, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets