Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision (2404.15672v1)

Published 24 Apr 2024 in cs.CV

Abstract: Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at https://github.com/JLiangLab/Eden.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (108)
  1. Chest x-rays (indiana university). https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university.
  2. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis, 66:101797, 2020.
  3. Deep vit features as dense visual descriptors. ECCVW What is Motion For?, 2022.
  4. Big self-supervised models advance medical image classification. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3458–3468, 2021.
  5. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nature Biomedical Engineering, 7:756–779, 2023.
  6. Masked autoencoders enable efficient knowledge distillers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24256–24265, 2023.
  7. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. CoRR, abs/2105.04906, 2021.
  8. Vicregl: Self-supervised learning of local visual features, 2022a.
  9. VICReg: Variance-invariance-covariance regularization for self-supervised learning. In International Conference on Learning Representations, 2022b.
  10. On the opportunities and risks of foundation models, 2021.
  11. Robust vessel segmentation in fundus images. International Journal of Biomedical Imaging, 2013.
  12. Three guidelines you should know for universally slimmable self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15742–15751, 2023.
  13. Unsupervised learning of visual features by contrasting cluster assignments. In Advances in Neural Information Processing Systems, pages 9912–9924. Curran Associates, Inc., 2020.
  14. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021.
  15. Contrastive learning of global and local features for medical image segmentation with limited annotations. In Advances in Neural Information Processing Systems, pages 12546–12558. Curran Associates, Inc., 2020.
  16. Mixed autoencoder for self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22742–22751, 2023.
  17. Self-supervised learning for medical image analysis using image context restoration. Medical image analysis, 58:101539, 2019.
  18. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, pages 1597–1607. PMLR, 2020a.
  19. Big self-supervised models are strong semi-supervised learners, 2020b.
  20. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15750–15758, 2021.
  21. Improved baselines with momentum contrastive learning, 2020c.
  22. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9640–9649, 2021.
  23. Can ai help in screening viral and covid-19 pneumonia? IEEE Access, 8:132665–132676, 2020.
  24. Robust contrastive learning against noisy views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16670–16681, 2022.
  25. Eyepacs: An adaptable telemedicine system for diabetic retinopathy screening. Diabetes Science and Technology, 3(3):509–516, 2009.
  26. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  27. Whitening for self-supervised representation learning. In Proceedings of the 38th International Conference on Machine Learning, pages 3015–3024. PMLR, 2021.
  28. Evolved part masking for self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10386–10395, 2023.
  29. Anatomy-aware contrastive representation learning for fetal ultrasound. In Computer Vision – ECCV 2022 Workshops, pages 422–436, 2023.
  30. Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13689–13698, 2022.
  31. Obow: Online bag-of-visual-words generation for self-supervised learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6826–6836, Los Alamitos, CA, USA, 2021. IEEE Computer Society.
  32. Jacob Gildenblat and contributors. Pytorch library for cam methods. https://github.com/jacobgil/pytorch-grad-cam, 2021.
  33. Bootstrap your own latent - a new approach to self-supervised learning. In Advances in Neural Information Processing Systems, pages 21271–21284. Curran Associates, Inc., 2020.
  34. Hcsc: Hierarchical contrastive selective coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9706–9715, 2022a.
  35. Discriminative, restorative, and adversarial learning: Stepwise incremental pretraining. In Domain Adaptation and Representation Transfer, pages 66–76, 2022b.
  36. Stepwise incremental pretraining for integrating discriminative, restorative, and adversarial learning. Medical Image Analysis, page 103159, 2024.
  37. Learning semantics-enriched representation via self-discovery, self-classification, and self-restoration. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pages 137–147, Cham, 2020. Springer International Publishing.
  38. Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE Transactions on Medical Imaging, 40(10):2857–2868, 2021.
  39. Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20824–20834, 2022.
  40. Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial? Medical Image Analysis, 94:103086, 2024.
  41. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  42. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  43. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, 2022.
  44. Geoffrey Hinton. How to represent part-whole hierarchies in a neural network. Neural Computation, 35(3):413–452, 2023.
  45. Geoffrey E. Hinton. Some demonstrations of the effects of structural descriptions in mental imagery. Cogn. Sci., 3:231–250, 1979.
  46. Geoffrey E. Hinton. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46(1):47–75, 1990.
  47. Matrix capsules with EM routing. In International Conference on Learning Representations, 2018.
  48. A systematic benchmarking analysis of transfer learning for medical image analysis. In Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health, pages 3–13, Cham, 2021. Springer International Publishing.
  49. Caid: Context-aware instance discrimination for self-supervised learning in medical imaging. In Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, pages 535–551. PMLR, 2022.
  50. Anatomy-aware self-supervised learning for aligned multi-modal medical data. In British Machine Vision Conference, 2022.
  51. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. arXiv:1901.07031, 2019.
  52. Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6), 2014.
  53. Anatomical invariance modeling and semantic alignment for self-supervised learning in 3d medical image analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15859–15869, 2023a.
  54. Layer grafted pre-training: Bridging contrastive learning and masked image modeling for label-efficient representations. In The Eleventh International Conference on Learning Representations, 2023b.
  55. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6:317, 2019.
  56. Intermediate layers matter in momentum contrastive self supervised learning. In Advances in Neural Information Processing Systems, pages 24063–24074. Curran Associates, Inc., 2021.
  57. Labeled optical coherence tomography (oct) and chest x-ray images for classification, 2018.
  58. Understanding masked autoencoders via hierarchical latent variable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7918–7928, 2023.
  59. Stacked capsule autoencoders. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
  60. Prototypical contrastive learning of unsupervised representations, 2021.
  61. A structure-aware relation network for thoracic diseases detection and segmentation. IEEE Transactions on Medical Imaging, 40(8):2042–2052, 2021.
  62. Mixmae: Mixed and masked autoencoder for efficient pretraining of hierarchical vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6252–6261, 2023.
  63. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021.
  64. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976–11986, 2022.
  65. Radimagenet: An open radiologic deep learning research dataset for effective transfer learning. Radiology: Artificial Intelligence, 4(5):e210315, 2022.
  66. A simple, efficient and scalable contrastive masked autoencoder for learning visual representations, 2022.
  67. STREAMER: Streaming representation learning and event segmentation in a hierarchical manner. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  68. Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching, 2023.
  69. Vindr-ribcxr: A benchmark dataset for automatic segmentation and labeling of individual ribs on chest x-rays. In Medical Imaging with Deep Learning, 2021.
  70. Vindr-cxr: An open dataset of chest x-rays with radiologist’s annotations, 2020.
  71. Unsupervised learning of dense visual representations. In Advances in Neural Information Processing Systems, pages 4489–4500. Curran Associates, Inc., 2020.
  72. POPAR: Patch order prediction and appearance recovery for self-supervised medical image analysis. In MICCAI Workshop on Domain Adaptation and Representation Transfer, pages 77–87. Springer, 2022.
  73. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing.
  74. Development of a digital image database for chest radiographs with and without a lung nodule. American Journal of Roentgenology, 174(1):71–74, 2000.
  75. Drishti-gs: Retinal image dataset for optic nerve head(onh) segmentation. In 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pages 53–56, 2014.
  76. Multi-mode online knowledge distillation for self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11848–11857, 2023.
  77. Rsna pneumonia detection challenge, 2018.
  78. Visual parser: Representing part-whole hierarchies with transformers, 2022.
  79. Towards foundation models learned from anatomy in medical imaging via self-supervision. arXiv:2309.15358, 2023.
  80. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20730–20740, 2022.
  81. Siamese image modeling for self-supervised vision representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2132–2141, 2023.
  82. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  83. Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Medical Image Analysis, 10(1):19–40, 2006.
  84. Hard patches mining for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10375–10385, 2023a.
  85. Masked image modeling with local multi-scale reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2122–2131, 2023b.
  86. Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Scientific Reports, 10(19549), 2020.
  87. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2097–2106, 2017.
  88. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3024–3033, 2021.
  89. Exploring set similarity for dense self-supervised representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16590–16599, 2022.
  90. Delving into masked autoencoders for multi-label thorax disease classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3588–3600, 2023.
  91. Region similarity representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10539–10548, 2021.
  92. Detco: Unsupervised contrastive learning for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8392–8401, 2021a.
  93. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16684–16693, 2021b.
  94. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9653–9663, 2022.
  95. On data scaling in masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10365–10374, 2023.
  96. Regioncl: Exploring contrastive region pairs for self-supervised representation learning. In Computer Vision – ECCV 2022, pages 477–494, Cham, 2022. Springer Nature Switzerland.
  97. Instance localization for self-supervised detection pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3987–3996, 2021.
  98. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  99. Siim-acr pneumothorax segmentation, 2019.
  100. Barlow twins: Self-supervised learning via redundancy reduction. arXiv:2103.03230, 2021.
  101. Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14441–14450, 2022a.
  102. Zero-CL: Instance and feature decorrelation for negative-free symmetric contrastive learning. In International Conference on Learning Representations, 2022b.
  103. Patch-level contrasting without patch correspondence for accurate and dense contrastive representation learning. In The Eleventh International Conference on Learning Representations, 2023.
  104. Leverage your local and global representations: A new self-supervised learning strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16580–16589, 2022c.
  105. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3499–3509, 2021a.
  106. Models genesis: Generic autodidactic models for 3d medical image analysis. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pages 384–393, Cham, 2019. Springer International Publishing.
  107. Models genesis. Medical Image Analysis, 67:101840, 2021b.
  108. Learning anatomically consistent embedding for chest radiography. In Proceedings of the 34th British Machine Vision Conference (BMVC 2023), 2023.

Summary

We haven't generated a summary for this paper yet.