Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading (2210.10969v5)

Published 20 Oct 2022 in cs.CV

Abstract: Self-supervised Learning (SSL) has been widely applied to learn image representations through exploiting unlabeled images. However, it has not been fully explored in the medical image analysis field. In this work, Saliency-guided Self-Supervised image Transformer (SSiT) is proposed for Diabetic Retinopathy (DR) grading from fundus images. We novelly introduce saliency maps into SSL, with a goal of guiding self-supervised pre-training with domain-specific prior knowledge. Specifically, two saliency-guided learning tasks are employed in SSiT: (1) Saliency-guided contrastive learning is conducted based on the momentum contrast, wherein fundus images' saliency maps are utilized to remove trivial patches from the input sequences of the momentum-updated key encoder. Thus, the key encoder is constrained to provide target representations focusing on salient regions, guiding the query encoder to capture salient features. (2) The query encoder is trained to predict the saliency segmentation, encouraging the preservation of fine-grained information in the learned representations. To assess our proposed method, four publicly-accessible fundus image datasets are adopted. One dataset is employed for pre-training, while the three others are used to evaluate the pre-trained models' performance on downstream DR grading. The proposed SSiT significantly outperforms other representative state-of-the-art SSL methods on all downstream datasets and under various evaluation settings. For example, SSiT achieves a Kappa score of 81.88% on the DDR dataset under fine-tuning evaluation, outperforming all other ViT-based SSL methods by at least 9.48%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. N. Cheung, P. Mitchell, and T. Y. Wong, “Diabetic retinopathy,” The Lancet, vol. 376, no. 9735, pp. 124–136, 2010.
  2. T. Li, W. Bo, C. Hu, H. Kang, H. Liu, K. Wang, and H. Fu, “Applications of deep learning in fundus images: A review,” Medical Image Analysis, vol. 69, p. 101971, 2021.
  3. Y. Huang, L. Lin, P. Cheng, J. Lyu, R. Tam, and X. Tang, “Identifying the key components in resnet-50 for diabetic retinopathy grading from fundus images: a systematic investigation,” Diagnostics, vol. 13, no. 10, p. 1664, 2023.
  4. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  5. Y. Tang, D. Yang, W. Li, H. R. Roth, B. Landman, D. Xu, V. Nath, and A. Hatamizadeh, “Self-supervised pre-training of swin transformers for 3d medical image analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20730–20740, 2022.
  6. A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584, 2022.
  7. S. Park, G. Kim, Y. Oh, J. B. Seo, S. M. Lee, J. H. Kim, S. Moon, J.-K. Lim, and J. C. Ye, “Multi-task vision transformer using low-level chest x-ray feature corpus for covid-19 diagnosis and severity quantification,” Medical Image Analysis, vol. 75, p. 102299, 2022.
  8. C. Matsoukas, J. F. Haslum, M. Söderberg, and K. Smith, “Is it time to replace cnns with transformers for medical images?,” arXiv preprint arXiv:2108.09038, 2021.
  9. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.
  10. X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649, 2021.
  11. S. Montabone and A. Soto, “Human detection using a mobile platform and novel features derived from a visual saliency mechanism,” Image and Vision Computing, vol. 28, no. 3, pp. 391–402, 2010.
  12. T. Li, Y. Gao, K. Wang, S. Guo, H. Liu, and H. Kang, “Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening,” Information Sciences, vol. 501, pp. 511–522, 2019.
  13. E. Decencière, X. Zhang, G. Cazuguel, B. Lay, B. Cochener, C. Trone, P. Gain, R. Ordonez, P. Massin, A. Erginay, et al., “Feedback on a publicly distributed image database: the messidor database,” Image Analysis & Stereology, vol. 33, no. 3, pp. 231–234, 2014.
  14. L. Lin, M. Li, Y. Huang, P. Cheng, H. Xia, K. Wang, J. Yuan, and X. Tang, “The sustech-sysu dataset for automated exudate detection and diabetic retinopathy grading,” Scientific Data, vol. 7, no. 1, pp. 1–10, 2020.
  15. A. He, T. Li, N. Li, K. Wang, and H. Fu, “Cabnet: Category attention block for imbalanced diabetic retinopathy grading,” IEEE Transactions on Medical Imaging, 2020.
  16. Y. Zhou, X. He, L. Huang, L. Liu, F. Zhu, S. Cui, and L. Shao, “Collaborative learning of semi-supervised segmentation and classification for medical images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2079–2088, 2019.
  17. R. Sun, Y. Li, T. Zhang, Z. Mao, F. Wu, and Y. Zhang, “Lesion-aware transformers for diabetic retinopathy grading,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10938–10947, 2021.
  18. J. Wu, R. Hu, Z. Xiao, J. Chen, and J. Liu, “Vision transformer-based recognition of diabetic retinopathy grade,” Medical Physics, vol. 48, no. 12, pp. 7850–7863, 2021.
  19. S. Yu, K. Ma, Q. Bi, C. Bian, M. Ning, N. He, Y. Li, H. Liu, and Y. Zheng, “Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 45–54, Springer, 2021.
  20. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
  21. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660, 2021.
  22. F. Haghighi, M. R. H. Taher, M. B. Gotway, and J. Liang, “Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20824–20834, 2022.
  23. Z. Li, Z. Chen, F. Yang, W. Li, Y. Zhu, C. Zhao, R. Deng, L. Wu, R. Zhao, M. Tang, et al., “Mst: Masked self-supervised transformer for visual representation,” Advances in Neural Information Processing Systems, vol. 34, pp. 13165–13176, 2021.
  24. X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Dense contrastive learning for self-supervised visual pre-training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3024–3033, 2021.
  25. C. Li, J. Yang, P. Zhang, M. Gao, B. Xiao, X. Dai, L. Yuan, and J. Gao, “Efficient self-supervised vision transformers for representation learning,” arXiv preprint arXiv:2106.09785, 2021.
  26. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009, 2022.
  27. Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “Simmim: A simple framework for masked image modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9653–9663, 2022.
  28. G. Wang, Y. Tang, L. Lin, and P. H. Torr, “Semantic-aware auto-encoders for self-supervised representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9675, 2022.
  29. Z. Cai, L. Lin, H. He, and X. Tang, “Uni4eye: Unified 2d and 3d self-supervised pre-training via masked image modeling transformer for ophthalmic image classification,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VIII, pp. 88–98, Springer, 2022.
  30. O. G. Holmberg, N. D. Köhler, T. Martins, J. Siedlecki, T. Herold, L. Keidel, B. Asani, J. Schiefelbein, S. Priglinger, K. U. Kortuem, et al., “Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy,” Nature Machine Intelligence, vol. 2, no. 11, pp. 719–726, 2020.
  31. X. Li, M. Jia, M. T. Islam, L. Yu, and L. Xing, “Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 4023–4033, 2020.
  32. H. Chen, R. Wang, X. Wang, J. Li, Q. Fang, H. Li, J. Bai, Q. Peng, D. Meng, and L. Wang, “Unsupervised local discrimination for medical images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  33. Y. Huang, L. Lin, P. Cheng, J. Lyu, and X. Tang, “Lesion-based contrastive learning for diabetic retinopathy grading from fundus images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 113–123, Springer, 2021.
  34. H.-Y. Zhou, C. Lu, S. Yang, X. Han, and Y. Yu, “Preservational learning improves self-supervised medical image models by reconstructing diverse contexts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3499–3509, 2021.
  35. F. Wang, Y. Zhou, S. Wang, V. Vardhanabhuti, and L. Yu, “Multi-granularity cross-modal alignment for generalized medical visual representation learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 33536–33549, 2022.
  36. P. Cheng, L. Lin, J. Lyu, Y. Huang, W. Luo, and X. Tang, “Prior: Prototype representation joint learning from medical images and reports,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21361–21371, 2023.
  37. M. Zhu, G. Hou, X. Chen, J. Xie, H. Lu, and J. Che, “Saliency-guided transformer network combined with local embedding for no-reference image quality assessment,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1953–1962, 2021.
  38. C. Lu, H. Zhu, and P. Koniusz, “From saliency to dino: Saliency-guided vision transformer for few-shot keypoint detection,” arXiv preprint arXiv:2304.03140, 2023.
  39. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  40. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  41. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21271–21284, 2020.
  42. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  43. Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742, 2018.
  44. X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
  45. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning, pp. 10347–10357, PMLR, 2021.
  46. R. Wightman, “Pytorch image models.” https://github.com/rwightman/pytorch-image-models, 2019.
  47. M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924, 2020.
  48. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  49. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, “Accurate, large minibatch sgd: Training imagenet in 1 hour,” arXiv preprint arXiv:1706.02677, 2017.
  50. A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby, “Big transfer (bit): General visual representation learning,” in European conference on computer vision, pp. 491–507, Springer, 2020.
  51. J. Cohen, “Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit.,” Psychological bulletin, vol. 70, no. 4, p. 213, 1968.
  52. X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,” in 2007 IEEE Conference on computer vision and pattern recognition, pp. 1–8, Ieee, 2007.
  53. H. Fu, F. Li, J. I. Orlando, H. Bogunović, X. Sun, J. Liao, Y. Xu, S. Zhang, and X. Zhang, “Adam: Automatic detection challenge on age-related macular degeneration,” 2020.
  54. H. Fu, F. Li, J. I. Orlando, H. Bogunović, X. Sun, J. Liao, Y. Xu, S. Zhang, and X. Zhang, “Palm: Pathologic myopia challenge,” 2019.
  55. J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE transactions on medical imaging, vol. 23, no. 4, pp. 501–509, 2004.
  56. P. Porwal, S. Pachade, R. Kamble, M. Kokare, G. Deshmukh, V. Sahasrabuddhe, and F. Meriaudeau, “Indian diabetic retinopathy image dataset (idrid): a database for diabetic retinopathy screening research,” Data, vol. 3, no. 3, p. 25, 2018.
  57. S. Azizi, B. Mustafa, F. Ryan, Z. Beaver, J. Freyberg, J. Deaton, A. Loh, A. Karthikesalingam, S. Kornblith, T. Chen, et al., “Big self-supervised models advance medical image classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3478–3488, 2021.
  58. X. Wang, M. Xu, J. Zhang, L. Jiang, L. Li, M. He, N. Wang, H. Liu, and Z. Wang, “Joint learning of multi-level tasks for diabetic retinopathy grading on low-resolution fundus images,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 5, pp. 2216–2227, 2021.
  59. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
  60. Z. Lin, R. Guo, Y. Wang, B. Wu, T. Chen, W. Wang, D. Z. Chen, and J. Wu, “A framework for identifying diabetic retinopathy based on anti-noise detection and attention-based fusion,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 74–82, Springer, 2018.
Citations (16)

Summary

We haven't generated a summary for this paper yet.