Metadata-guided Consistency Learning for High Content Images (2212.11595v2)
Abstract: High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods have shown great success on natural images, and offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which are caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a self-supervised approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, leading to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks -- such as distinguishing treatments and mechanism of action.
- Improving phenotypic measurements in high-content imaging screens. BioRxiv, page 161422, 2017.
- Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature protocols, 11(9):1757–1774, 2016.
- A test metric for assessing single-cell rna-seq batch correction. Nature methods, 16(1):43–49, 2019.
- Weakly supervised learning of single-cell feature embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9309–9318, 2018.
- Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294, 2021.
- Cellprofiler: image analysis software for identifying and quantifying cell phenotypes. Genome biology, 7(10):1–11, 2006.
- A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
- The drug repurposing hub: a next-generation drug library and information resource. Nature medicine, 23(4):405–408, 2017.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733, 2020.
- Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Accurate prediction of biological assays with high-throughput microscopy images and convolutional networks. Journal of chemical information and modeling, 59(3):1163–1171, 2019.
- Fully unsupervised deep mode of action learning for phenotyping high-content cellular images. Bioinformatics, 37(23):4548–4555, 2021.
- Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics, 8(1):118–127, 2007.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Is it time to replace cnns with transformers for medical images? arXiv preprint arXiv:2108.09038, 2021.
- What makes transfer learning work for medical images: feature reuse & other factors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9225–9234, 2022.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Learning representations for image-based profiling of perturbations. bioRxiv, 2022.
- Contrastive learning of single-cell phenotypic representations for treatment classification. In International Workshop on Machine Learning in Medical Imaging, pages 565–575. Springer, 2021.
- Self-supervised representation learning for high-content screening. In Medical Imaging with Deep Learning, 2021.
- Repurposing high-throughput image assays enables biological activity prediction for drug discovery. Cell Chemical Biology, 25(5):611–618.e3, 2018. ISSN 2451-9456. https://doi.org/10.1016/j.chembiol.2018.01.015. URL https://www.sciencedirect.com/science/article/pii/S2451945618300370.
- Cytominer: Methods for Image-Based Cell Profiling, 5 2020. URL https://cran.r-project.org/package=cytominer.
- Rxrx1: An image set for cellular morphological variation across many experimental batches. In International Conference on Learning Representations (ICLR), 2019.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
- Cell morphological profiling enables high-throughput screening for proteolysis targeting chimera (protac) phenotypic signature. ACS Chemical Biology, 17(7):1733–1744, 2022.
- Evaluation of machine learning classifiers to predict compound mechanism of action when transferred across distinct cell lines. SLAS DISCOVERY: Advancing Life Sciences R&D, 24(3):224–233, 2019.
- Morphology and gene expression profiling provide complementary information for mapping cell state. bioRxiv, 2022. 10.1101/2021.10.21.465335. URL https://www.biorxiv.org/content/early/2022/10/12/2021.10.21.465335.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
- Johan Fredin Haslum (8 papers)
- Christos Matsoukas (13 papers)
- Karl-Johan Leuchowius (2 papers)
- Erik Müllers (1 paper)
- Kevin Smith (43 papers)