Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Keypoint-Augmented Self-Supervised Learning for Medical Image Segmentation with Limited Annotation (2310.01680v2)

Published 2 Oct 2023 in cs.CV and cs.AI

Abstract: Pretraining CNN models (i.e., UNet) through self-supervision has become a powerful approach to facilitate medical image segmentation under low annotation regimes. Recent contrastive learning methods encourage similar global representations when the same image undergoes different transformations, or enforce invariance across different image/patch features that are intrinsically correlated. However, CNN-extracted global and local features are limited in capturing long-range spatial dependencies that are essential in biological anatomy. To this end, we present a keypoint-augmented fusion layer that extracts representations preserving both short- and long-range self-attention. In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial self-attention among localized keypoint features. Further, we introduce both global and local self-supervised pretraining for the framework. At the global scale, we obtain global representations from both the bottleneck of the UNet, and by aggregating multiscale keypoint features. These global features are subsequently regularized through image-level contrastive objectives. At the local scale, we define a distance-based criterion to first establish correspondences among keypoints and encourage similarity between their features. Through extensive experiments on both MRI and CT segmentation tasks, we demonstrate the architectural advantages of our proposed method in comparison to both CNN and Transformer-based UNets, when all architectures are trained with randomly initialized weights. With our proposed pretraining strategy, our method further outperforms existing SSL methods by producing more robust self-attention and achieving state-of-the-art segmentation results. The code is available at https://github.com/zshyang/kaf.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Semi-supervised semantic segmentation with pixel-level contrastive learning from a class-wise memory bank. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8219–8228, October 2021.
  2. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11):2514–2525, 2018.
  3. Adapting the mean teacher for keypoint-based lung registration under geometric domain shifts. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VI, pages 280–290. Springer, 2022.
  4. The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11853–11863, 2021.
  5. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
  6. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  7. Contrastive learning of global and local features for medical image segmentation with limited annotations. Advances in Neural Information Processing Systems, 33:12546–12558, 2020.
  8. Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation. Medical Image Analysis, 87:102792, 2023.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  10. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255, 2020.
  11. Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34, 2021.
  12. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15750–15758, June 2021.
  13. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  15. Discriminative unsupervised feature learning with convolutional neural networks. Advances in neural information processing systems, 27, 2014.
  16. Self-supervised contrastive learning with random walks for medical image segmentation with limited annotations. Computerized Medical Imaging and Graphics, 104:102174, 2023.
  17. Scattering-keypoint-guided network for oriented ship detection in high-resolution and large-scale sar images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:11162–11178, 2021.
  18. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  19. Keypoint transformer: Solving joint identification in challenging hands and object interactions for accurate 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11090–11100, 2022.
  20. Vision gnn: An image is worth graph of nodes. arXiv preprint arXiv:2206.00272, 2022.
  21. Graphregnet: deep graph regularisation networks on sparse keypoints for dense registration of 3d lung cts. IEEE Transactions on Medical Imaging, 40(9):2246–2257, 2021.
  22. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop, pages 272–284. Springer, 2021.
  23. Unetformer: A unified vision transformer model and pre-training framework for 3d medical image segmentation. arXiv preprint arXiv:2204.00631, 2022.
  24. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  25. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  26. Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. In European Conference on Computer Vision, pages 108–126. Springer, 2022.
  27. Region-aware contrastive learning for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16291–16301, October 2021.
  28. Boosting contrastive self-supervised learning with false negative cancellation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2785–2795, 2022.
  29. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  30. Swinbts: A method for 3d multimodal brain tumor segmentation using swin transformer. Brain sciences, 12(6):797, 2022.
  31. A modified hsift descriptor for medical image classification of anatomy objects. Symmetry, 13(11):1987, 2021.
  32. Learning image representations by completing damaged jigsaw puzzles. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 793–802. IEEE, 2018.
  33. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  34. Segment anything. arXiv:2304.02643, 2023.
  35. Supervised contrastive embedding for medical image segmentation. IEEE Access, 9:138403–138414, 2021.
  36. Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Transactions on Instrumentation and Measurement, 71:1–15, 2022.
  37. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
  38. Bootstrapping semantic segmentation with regional contrast. In International Conference on Learning Representations, 2022.
  39. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  40. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  41. David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.
  42. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  43. Graph-based region and boundary aggregation for biomedical image segmentation. IEEE transactions on medical imaging, 41(3):690–701, 2021.
  44. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
  45. Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6707–6717, 2020.
  46. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  47. Self-supervised learning for few-shot medical image segmentation. IEEE Transactions on Medical Imaging, 41(7):1837–1848, 2022.
  48. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
  49. Boosting semi-supervised image segmentation with global and local mutual information regularization. arXiv preprint arXiv:2103.04813, 2021.
  50. Current methods in medical image segmentation. Annual review of biomedical engineering, 2(1):315–337, 2000.
  51. Local spatiotemporal representation learning for longitudinally-consistent neuroimage analysis. Advances in Neural Information Processing Systems, 35:13541–13556, 2022.
  52. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  53. Estimation of large motion in lung ct by integrating regularized keypoint correspondences into dense deformable registration. IEEE transactions on medical imaging, 36(8):1746–1757, 2017.
  54. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  55. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  56. Caid: Context-aware instance discrimination for self-supervised learning in medical imaging. In International Conference on Medical Imaging with Deep Learning, pages 535–551. PMLR, 2022.
  57. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In Proceedings of the 27th ACM international conference on multimedia, pages 2052–2060, 2019.
  58. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022.
  59. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer, 2020.
  60. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  61. Keypoint transfer for fast whole-body segmentation. IEEE transactions on medical imaging, 39(2):273–282, 2018.
  62. Self-supervised learning based transformer and convolution hybrid network for one-shot organ segmentation. Neurocomputing, 527:1–12, 2023.
  63. Exploring set similarity for dense self-supervised representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16590–16599, 2022.
  64. Non-local u-nets for biomedical image segmentation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 6315–6322, 2020.
  65. Representing long-range context for graph neural networks with global attention. Advances in Neural Information Processing Systems, 34:13266–13279, 2021.
  66. Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME), pages 327–331. IEEE, 2018.
  67. Whole heart and great vessel segmentation in congenital heart disease using deep neural networks and graph matching. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22, pages 477–485. Springer, 2019.
  68. Sam: Self-supervised learning of pixel-wise anatomical embeddings in radiological images. IEEE Transactions on Medical Imaging, 41(10):2658–2669, 2022.
  69. Multi-scale cell instance segmentation with keypoint graph based bounding boxes. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, pages 369–377. Springer, 2019.
  70. Rethinking semi-supervised medical image segmentation: A variance-reduction perspective. arXiv preprint arXiv:2302.01735, 2023.
  71. Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. arXiv preprint arXiv:2206.02307, 2022.
  72. Positional contrastive learning for volumetric medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pages 221–230. Springer, 2021.
  73. Msvrl: Self-supervised multiscale visual representation learning via cross-level consistency for medical image segmentation. IEEE Transactions on Medical Imaging, 42(1):91–102, 2022.
  74. Medical image retrieval using sift feature. In 2009 2nd International Congress on Image and Signal Processing, pages 1–4. IEEE, 2009.
  75. Pixel contrastive-consistent semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7273–7282, October 2021.
  76. Pixel contrastive-consistent semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7273–7282, 2021.
  77. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging, 39(6):1856–1867, 2019.
  78. Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6002–6012, 2019.
  79. Kam-net: Keypoint-aware and keypoint-matching network for vehicle detection from 2-d point cloud. IEEE Transactions on Artificial Intelligence, 3(2):207–217, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com