Self-supervised Visualisation of Medical Image Datasets (2402.14566v2)
Abstract: Self-supervised learning methods based on data augmentations, such as SimCLR, BYOL, or DINO, allow obtaining semantically meaningful representations of image datasets and are widely used prior to supervised fine-tuning. A recent self-supervised learning method, $t$-SimCNE, uses contrastive learning to directly train a 2D representation suitable for visualisation. When applied to natural image datasets, $t$-SimCNE yields 2D visualisations with semantically meaningful clusters. In this work, we used $t$-SimCNE to visualise medical image datasets, including examples from dermatology, histology, and blood microscopy. We found that increasing the set of data augmentations to include arbitrary rotations improved the results in terms of class separability, compared to data augmentations used for natural images. Our 2D representations show medically relevant structures and can be used to aid data exploration and annotation, improving on common approaches for data visualisation.
- A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data in Brief, 30, 2020.
- A Cookbook of Self-Supervised Learning. ArXiv, abs/2304.12210, 2023.
- Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017.
- Unsupervised visualization of image datasets using contrastive learning. In International Conference on Learning Representations, 2023.
- Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head. ArXiv, abs/2206.13378, 2022.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607. PMLR, 2020.
- A review of medical image data augmentation techniques for deep learning applications. Journal of Medical Imaging and Radiation Oncology, 65(5):545–563, 2021.
- Self-supervised learning for characterising histomorphological diversity and spatial RNA expression prediction across 23 human tissue types. bioRxiv, pages 2023–08, 2023.
- Evgin Goceri. Medical image data augmentation: techniques, comparisons and interpretations. Artificial Intelligence Review, pages 1–45, 2023.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pages 770–778, 2016.
- Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine, 6(1):74, 2023.
- Benchmarking Self-Supervised Learning on Diverse Pathology Datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3344–3354, 2023.
- Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Medicine, 16(1):e1002730, 2019.
- Learning multiple layers of features from tiny images. 2009.
- Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2021.
- A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, 2017.
- A Single-cell Morphological Dataset of Leukocytes from AML Patients and Non-malignant Controls. The Cancer Imaging Archive, 2019a.
- Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nature Machine Intelligence, 1(11):538–544, 2019b.
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, 2020.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. bioRxiv, 2019.
- Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, 1987. ISSN 0377-0427.
- What makes for good views for contrastive learning? Advances in Neural Information Processing Systems, 33:6827–6839, 2020.
- Eric J Topol. High-performance medicine: the convergence of human and artificial intelligence. Nature medicine, 25(1):44–56, 2019.
- The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5(1):1–9, 2018.
- Representation Learning with Contrastive Predictive Coding, 2019.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2008.
- Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays, 2023.
- Rotation equivariant CNNs for digital pathology. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11, pages 210–218. Springer, 2018.
- Transpath: Transformer-based self-supervised learning for histopathological image classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24, pages 186–195. Springer, 2021.
- MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Scientific Data, 10(1):41, 2023.
- A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE, 109(5):820–838, 2021.