Simplicity within biological complexity (2405.09595v1)
Abstract: Heterogeneous, interconnected, systems-level, molecular data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics. It will lead to a paradigm shift in computational and biomedical understanding of data and diseases that will open up ways to solving some of the major bottlenecks in precision medicine and other domains.
- Anne Glover. The 21st century: The age of biology. OECD Forum on Global Biotechnology, 2012.
- The century of biology. New Perspectives Quarterly, 31(1):28–37, 2014.
- The BIOGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Science, 30(1):187–200, 2021.
- NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research, 41(D1):D991–D995, 2012.
- Expression atlas update: gene and protein expression in multiple species. Nucleic Acids Research, 50(D1):D129–D140, 2022.
- Expression atlas update: from tissues to single cells. Nucleic Acids Research, 48(D1):D77–D83, 2020.
- PISTACHIO database. https://www.nextmovesoftware.com/pistachio.html. Accessed: 2023-11-27.
- The open reaction database. Journal of the American Chemical Society, 143(45):18820–18826, 2021.
- REAXIS database. https://www.elsevier.com/products/reaxys. Accessed: 2023-11-27.
- The gene ontology knowledgebase in 2023. Genetics, 224(1):iyad031, 2023.
- The DO-KB knowledgebase: a 20-year journey developing the disease open science ecosystem. Nucleic Acids Research, page gkad1051, 2023.
- Genome-wide association studies. Nature Reviews Methods Primers, 1(1):59, 2021.
- The Cancer Genome Atlas program (TCGA). https://www.cancer.gov/ccg/research/genome-sequencing/tcga. Accessed: 2023-11-27.
- AACR project GENIE: powering precision medicine through an international consortium. Cancer Discovery, 7(8):818–831, 2017.
- DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Research, 41(D1):D936–D941, 2012.
- The UK BIOBANK. https://www.ukbiobank.ac.uk/. Accessed: 2023-11-27.
- Nataša Pržulj. Analyzing Network Data in Biology and Medicine: An Interdisciplinary Textbook for Biological, Medical and Computational Scientists. Cambridge University Press, 2019.
- Gregory Chaitin. On the intelligibility of the universe and the notions of simplicity, complexity, and irreducibility. In Wolfram Hogrebe, editor, Grenzen und Grenzüberschreitungen: XIX. Deutscher Kongress für Philosophie, Bonn, 23.-27. September 2002Vorträge und Kolloquien, pages 517–534. Akademie Verlag, Berlin, Boston, 2004.
- Piotr Lichacz. Epistemic simplicity–a virtue or a vice? Metaphilosophy, 52(2):200–219, 2021.
- Vulnerability analysis of transformer-based optical character recognition to adversarial attacks. arXiv, preprint 1506.07540, 2015.
- The boundaries of verifiable accuracy, robustness, and generalisation in deep learning. In Lazaros Iliadis, Antonios Papaleonidas, Plamen Angelov, and Chrisina Jayne, editors, Artificial Neural Networks and Machine Learning – ICANN 2023, pages 530–541, Cham, 2023. Springer Nature Switzerland.
- Laws of biology: why so few? Systems and Synthetic Biology, 4:7–13, 2010.
- Douglas S Glazier. Variable metabolic scaling breaks the law: from ‘Newtonian’to ‘Darwinian’approaches. Proceedings of the Royal Society B, 289(1985):20221605, 2022.
- Raluca Eftimie. Grand challenges in mathematical biology: Integrating multi-scale modeling and data. Frontiers in Applied Mathematics and Statistics, 8:1010622, 2022.
- Next-generation sequencing articles from across Nature portfolio:. https://www.nature.com/subjects/next-generation-sequencing. Accessed: 2023-11-27.
- The ENCODE project. https://www.encodeproject.org/. Accessed: 2023-11-27.
- ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57, 2012.
- Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences, 111(17):6131–6138, 2014.
- bioRxiv ENCODE channel:. https://connect.biorxiv.org/relate/content/177. Accessed: 2023-11-27.
- Toward a protein–protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proceedings of the National Academy of Sciences, 97(3):1143–1147, 2000.
- A comprehensive analysis of protein–protein interactions in saccharomyces cerevisiae. Nature, 403(6770):623–627, 2000.
- A protein interaction map of drosophila melanogaster. Science, 302(5651):1727–1736, 2003.
- A map of the interactome network of the metazoan c. elegans. Science, 303(5657):540–543, 2004.
- A human protein–protein interaction network: a resource for annotating the proteome. Cell, 122(6):957–968, 2005.
- Empirically controlled mapping of the caenorhabditis elegans protein–protein interactome network. Nature Methods, 6(1):47–54, 2009.
- Evidence for network evolution in an arabidopsis interactome map. Science, 333(6042):601–607, 2011.
- Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084):631–636, 2006.
- Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637–643, 2006.
- The genetic landscape of a cell. Science, 327(5964):425–431, 2010.
- Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nature Methods, 7(12):1017–1024, 2010.
- A global genetic interaction network maps a wiring diagram of cellular function. Science, 353(6306):aaf1420, 2016.
- John Quackenbush. Computational analysis of microarray data. Nature Reviews Genetics, 2(6):418–427, 2001.
- GENMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nature Genetics, 31(1):19–20, 2002.
- RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research, 18(9):1509–1517, 2008.
- Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 5(7):621–628, 2008.
- RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1):57–63, 2009.
- Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6(2):95–108, 2005.
- A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science, 314(5804):1461–1463, 2006.
- Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5):1202–1214, 2015.
- Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187–1201, 2015.
- Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics, 16(3):133–145, 2015.
- Single-cell genomics: coming of age, 2016.
- The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. https://www.genome.jp/kegg/pathway.html. Accessed: 2023-11-27.
- Omics and integrated omics for the promotion of food and nutrition science. Journal of Traditional and Complementary Medicine, 1(1):25–30, 2011.
- Radiomics: the process and the challenges. Magnetic Resonance Imaging, 30(9):1234–1248, 2012.
- Radiomics: extracting more information from medical images using advanced feature analysis. European Journal of Cancer, 48(4):441–446, 2012.
- The exposome: molecules to populations. Annual Review of Pharmacology and Toxicology, 59:107–127, 2019.
- The National Institutes of Health (NIH) microbiome project. https://www.hmpdacc.org/. Accessed: 2023-11-27.
- The ALLERGOME database. https://www.allergome.org/. Accessed: 2023-11-27.
- The FOODOME project. https://www.barabasilab.com/science/project/foodome. Accessed: 2023-11-27.
- The DRUGBANK database. https://go.drugbank.com/. Accessed: 2023-11-27.
- The CHEMBL database. https://www.ebi.ac.uk/chembl/. Accessed: 2023-11-27.
- The PUBCHEM database. https://pubchem.ncbi.nlm.nih.gov/. Accessed: 2023-11-27.
- National pesticide information center, databases for chemical information. http://npic.orst.edu/ingred/cheminfo.html. Accessed: 2023-11-27.
- Pesticide, bio-pesticide and veterinary substances properties databases. http://sitem.herts.ac.uk/aeru/ppdb/. Accessed: 2023-11-27.
- The UniProt Consortium. UNIPROT: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1):D523–D531, 11 2022.
- PROTBANK: A repository for protein design and engineering data. Protein Science, 27(6):1113–1124, 2018.
- RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined pdb structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Research, 51(D1):D488–D508, 2023.
- PEBANK: A comprehensive database for protein engineering and design. Biophysical Journal, 114(3):411a, 2018.
- Patient-specific data fusion for cancer stratification and personalised treatment. In Biocomputing 2016: Proceedings of the Pacific Symposium, pages 321–332. World Scientific, 2016.
- Towards a data-integrated cell. Nature Communications, 10(1):805, 2019.
- Network neighbors of viral targets and differentially expressed genes in COVID-19 are drug target candidates. Scientific Reports, 11(1):18985, 2021.
- Integrated data analysis uncovers new COVID-19 related genes and potential drug re-purposing candidates. International Journal of Molecular Sciences, 24(2):1431, 2023.
- Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering, 6(12):1353–1369, 2022.
- To embed or not: network embedding as a paradigm in computational biology. Frontiers in Genetics, 10:381, 2019.
- Analysis of disease comorbidity patterns in a large-scale china population. BMC Medical Genomics, 12(12):1–10, 2019.
- Ontology-based disease similarity network for disease gene prediction. Vietnam Journal of Computer Science, 3(3):197–205, 2016.
- Uncovering disease-disease relationships through the incomplete interactome. Science, 347(6224):1257601, 2015.
- Network diffusion approach to predict lncRNA disease associations using multi-type biological networks: LION. Frontiers in Physiology, 10:888, 2019.
- Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proceedings of the National Academy of Sciences, 118(19):e2025581118, 2021.
- Network-based prediction of drug combinations. Nature Communications, 10(1):1197, 2019.
- A genome-wide positioning systems network algorithm for in silico drug repurposing. Nature Communications, 10(1):3476, 2019.
- Modeling interactome: scale-free or geometric? Bioinformatics, 20(18):3508–3515, 2004.
- Nataša Pržulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):e177–e183, 2007.
- Uncovering biological network function via graphlet degree signatures. Cancer Informatics, 6:CIN–S680, 2008.
- Graphlet laplacians for topology-function and topology-disease relationships. Bioinformatics, 35(24):5226–5234, 2019.
- Graphlet eigencentralities capture novel central roles of genes in pathways. PloS ONE, 17(1):e0261676, 2022.
- Prediction of drug–target interactions from multi-molecular network based on deep walk embedding model. Frontiers in Bioengineering and Biotechnology, 8:338, 2020.
- MIPDH: a novel computational model for predicting microRNA–mRNA interactions by DeepWalk on a heterogeneous network. ACS Omega, 5(28):17022–17032, 2020.
- HERGEPRED: heterogeneous network embedding representation for disease gene prediction. IEEE Journal of Biomedical and Health Informatics, 23(4):1805–1815, 2018.
- ISCCORE: a novel graph kernel-based function for scoring protein–protein docking models. Bioinformatics, 36(1):112–121, 2020.
- HYPERFOODS: Machine intelligent mapping of cancer-beating molecules in foods. Scientific Reports, 9(1):9237, 2019.
- Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 34(1):249–270, 2020.
- Network embedding in biomedical data science. Briefings in Bioinformatics, 21(1):182–197, 2020.
- Going the distance for protein function prediction: a new distance metric for protein interaction networks. PloS ONE, 8(10):e76339, 2013.
- Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data mining, pages 1320–1329, 2018.
- Persistent homology - a survey. Contemporary Mathematics, 453(26):257–282, 2008.
- Deep learning with topological signatures. Advances in Neural Information Processing Systems, 30, 2017.
- PERSLAY: A neural network layer for persistence diagrams and new graph topological signatures. In International Conference on Artificial Intelligence and Statistics, pages 2786–2796. PMLR, 2020.
- Graph filtration learning. In International Conference on Machine Learning, pages 4314–4323. PMLR, 2020.
- Learning representations of persistence barcodes. Journal of Machine Learning Research, 20(126):1–45, 2019.
- Peter Bubenik et al. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(1):77–102, 2015.
- Computational Topology: an Introduction. American Mathematical Society, 2022.
- Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.
- Topological methods for the analysis of high dimensional data sets and 3D object recognition. Eurographics Symposium on Point-Based Graphics, 2:091–100, 2007.
- Predicting associations among drugs, targets and diseases by tensor decomposition for drug repositioning. BMC Bioinformatics, 20:1–19, 2019.
- Head and neck cancer subtypes with biological and clinical relevance: Meta-analysis of gene-expression data. Oncotarget, 6(11):9627, 2015.
- A visual analytics framework for analysis of patient trajectories. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 15–24, 2019.
- Deep graph mapper: Seeing graphs through the neural lens. Frontiers in Big Data, 4:680535, 2021.
- global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
- Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 14, 2001.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2008.
- Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 2013.
- NODE2VEC: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery & Data mining, pages 855–864, 2016.
- GCN-MF: disease-gene association identification by graph convolutional networks and matrix factorization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data mining, pages 705–713, 2019.
- Generative models for graph-based protein design. Advances in Neural Information Processing Systems, 32, 2019.
- Integrating protein-protein interaction information into drug response prediction by graph neural encoding. Research Square, preprint rs.2.18936, 2019.
- Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017.
- Variational graph auto-encoders. arXiv, preprint 1611.07308, 2016.
- Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151:78–94, 2018.
- A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5):833–852, 2019.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
- scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data. Nature Machine Intelligence, 4(10):852–866, 2022.
- scGPT: Towards building a foundation model for single-cell multi-omics using generative AI. bioRxiv, preprint 2023.04.30.538439, 2023.
- GenePT: A simple but hard-to-beat foundation model for genes and cells built from chatgpt. bioRxiv, preprint 2023.10.16.562533, 2023.
- DeID-GPT: Zero-shot medical text de-identification by gpt-4. arXiv, preprint 2303.11032, 2023.
- LLMCarbon: Modeling the end-to-end carbon footprint of large language models. arXiv, preprint 2309.14393, 2023.
- Evaluating the utilities of large language models in single-cell data analysis. bioRxiv, preprint 2023.09.08.555192, 2023.
- Assessing the limits of zero-shot foundation models in single-cell biology. bioRxiv, preprint 2023.10.16.561085, 2023.
- A deep dive into single-cell rna sequencing foundation models. bioRxiv, preprint 2023.10.19.563100, 2023.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv, preprint 2311.05232, 2023.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021.
- Efficient estimation of word representations in vector space. arXiv, preprint 1301.3781, 2013.
- Distributed representations of sentences and documents. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32(2) of Proceedings of Machine Learning Research, pages 1188–1196, Bejing, China, 22–24 Jun 2014. PMLR.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751, 2013.
- The (too many) problems of analogical reasoning with word vectors. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, pages 135–148, 2017.
- Classifying and completing word analogies by machine learning. International Journal of Approximate Reasoning, 132:1–25, 2021.
- Association for the Advancement of Artificial Intelligence, “Working together on our future with AI”. https://aaai.org/working-together-on-our-future-with-ai/. April 5, 2023.
- Future of Life Institute, “Pause Giant AI Experiments: An Open Letter”. https://futureoflife.org/open-letter/pause-giant-ai-experiments/. March 22, 2023.
- Linear functional organization of the omic embedding space. Bioinformatics, 37(21):3839–3847, 2021.
- A functional analysis of omic network embedding spaces reveals key altered functions in cancer. Bioinformatics, 39(5):btad281, 2023.
- The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell. bioRxiv, preprint 2023.07.31.551263, 2023.
- Stephen A. Cook. The complexity of theorem proving procedures. In Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, pages 151–158, 1971.
- Computational network biology: data, models, and applications. Physics Reports, 846:1–66, 2020.
- A guide to conquer the biological network era using graph theory. Frontiers in Bioengineering and Biotechnology, 8:34, 2020.
- Challenges and opportunities in network-based solutions for biological questions. Briefings in Bioinformatics, 23(1):bbab437, 2022.
- Mark EJ Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003.
- Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
- Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathematics, 2(4):431–523, 2005.
- Effect of sampling on topology predictions of protein-protein interaction networks. Nature Biotechnology, 23(7):839–844, 2005.
- Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences, 102(12):4221–4224, 2005.
- Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics, pages 488–495. PMLR, 2009.
- Graphlet kernels for prediction of functional residues in protein structures. Journal of Computational Biology, 17(1):55–72, 2010.
- Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):657–668, 2022.
- Higher-order molecular organization as a source of biological function. Bioinformatics, 34(17):i944–i953, 2018.
- Classification in biological networks with hypergraphlet kernels. Bioinformatics, 37(7):1000–1007, 2021.
- Functional geometry of protein interactomes. Bioinformatics, 35(19):3727–3734, 2019.
- Computational approaches for network-based integrative multi-omics analysis. Frontiers in Molecular Biosciences, 9:1214, 2022.
- Using machine learning approaches for multi-omics data analysis: A review. Biotechnology Advances, 49:107739, 2021.
- Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells. Clinical and Experimental Pediatrics, 65(5):239, 2022.
- Missing data in multi-omics integration: Recent advances through artificial intelligence. Frontiers in Artificial Intelligence, 6:1098308, 2023.
- Methods of integrating data to uncover genotype–phenotype interactions. Nature Reviews Genetics, 16(2):85–97, 2015.
- Methods for biological data integration: perspectives and challenges. Journal of the Royal Society Interface, 12(112):20150571, 2015.
- Integrative methods for analyzing big data in precision medicine. Proteomics, 16(5):741–758, 2016.
- Integration strategies of multi-omics data for machine learning analysis. Computational and Structural Biotechnology Journal, 19:3735–3746, 2021.
- Unsupervised multi-omics data integration methods: a comprehensive review. Frontiers in Genetics, 13:854752, 2022.
- Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101–10106, 2000.
- Singular value decomposition and principal component analysis. In A Practical Approach to Microarray Data Analysis, pages 91–109. Springer, 2003.
- Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in Bioinformatics, 17(4):628–641, 2016.
- Independent component analysis for unraveling the complexity of cancer omics datasets. International Journal of Molecular Sciences, 20(18):4414, 2019.
- Enter the matrix: factorization uncovers knowledge from omics. Trends in Genetics, 34(10):790–805, 2018.
- Zi Yang and George Michailidis. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics, 32(1):1–8, 2016.
- Improving knowledge on the activation of bone marrow fibroblasts in MGUS and MM disease through the automatic extraction of genes via a nonnegative matrix factorization approach on gene expression profiles. Journal of Translational Medicine, 16(1):1–16, 2018.
- Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research, 46(20):10546–10562, 2018.
- Orthogonal joint sparse NMF for microarray data analysis. Journal of Mathematical Biology, 79:223–247, 2019.
- Nonnegative matrix factorization models for knowledge extraction from biomedical and other real world data. Proceedings in Applied Mathematics and Mechanics, 20(1):e202000032, 2021.
- Nicolas Gillis. Nonnegative matrix factorization. Book Series on Data Science. Society for Industrial and Applied Mathematics (SIAM), 2020.
- Global optimality in tensor factorization, deep learning, and beyond. arXiv, preprint 1506.07540, 2015.
- Karthik Devarajan. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Computational Biology, 4(7):e1000029, 2008.
- Semi-supervised clustering via matrix factorization. In Proceedings of the SIAM International Conference on Data Mining, 2008.
- Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 126–135, 2006.
- Multi-omics integration of scRNA-seq time series data predicts new intervention points for parkinson’s disease. bioRxiv, preprint 2023.12.12.570554, 2023.
- Comparison of dimension reduction techniques in the analysis of mass spectrometry data. Atmospheric Measurement Techniques, 13(6):2995–3022, 2020.
- Flavia Esposito. A review on initialization methods for nonnegative matrix factorization: Towards omics data experiments. Mathematics, 9(9):1006, 2021.
- Computational methods for integration of biological data. In D. Rukavina N. Bodiroga-Vukobrat, K. Pavelic and G. G. Sander, editors, Personalised Medicine: A New Medical and Social Challenge, chapter 8. Springer Verlag, 2004.
- A phenotype driven integrative framework uncovers molecular mechanisms of a rare hereditary thrombophilia. PLoS ONE, 18(4):e0284084, 2023.
- Stephen A Vavasis. On the complexity of nonnegative matrix factorization. SIAM Journal on Optimization, 20(3):1364–1377, 2010.
- Fast optimization of non-negative matrix tri-factorization. PLoS ONE, 14(6):e0217994, 2019.
- Mengjia Xu. Understanding graph embedding methods and their applications. SIAM Review, 63(4):825–853, 2021.
- Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
- Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems, 27(4):809–821, 2015.
- Evolutionary generative adversarial networks. IEEE Transactions on Evolutionary Computation, 23(6):921–934, 2019.
- Graph neural networks: A review of methods and applications. AI Open, 1:57–81, 2020.
- Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, 2017.
- Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
- Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems, 31, 2018.
- Pinet: A permutation invariant graph neural network for graph classification. arXiv, preprint 1905.03046, 2019.
- Going deeper into permutation-sensitive graph neural networks. In International Conference on Machine Learning, pages 9377–9409. PMLR, 2022.
- Permutation invariant representations with applications to graph deep learning. arXiv, preprint 2203.07546, 2022.
- Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering, 34(10):4854–4873, 2020.
- Deep partial multiplex network embedding. In Companion Proceedings of the Web Conference 2022, pages 1053–1062, 2022.
- Zellig S Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
- Robust negative sampling for network embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pages 3191–3198, 2019.
- Distributed representations of sentences and documents. In International Conference on Machine Learning, pages 1188–1196. PMLR, 2014.
- A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations, 2017.
- DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 701–710, 2014.
- LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077, 2015.
- PTE: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1165–1174, 2015.
- Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems, 27, 2014.
- Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2vec. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pages 459–467, 2018.
- Network-based prediction of protein function. Molecular Systems Biology, 3(1):88, 2007.
- Identifying protein complexes and functional modules—from static PPI networks to dynamic PPI networks. Briefings in Bioinformatics, 15(2):177–194, 2014.
- Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia. JAMIA Open, 1(1):75–86, 2018.
- How to generate a good word embedding. IEEE Intelligent Systems, 31(6):5–14, 2016.
- Towards lower bounds on number of dimensions for word embeddings. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 31–36, 2017.
- Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv, preprint 1609.08144, 2016.
- Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 36(4):1241–1251, 2020.
- Zi Yin and Yuanyuan Shen. On the dimensionality of word embedding. Advances in Neural Information Processing Systems, 31, 2018.
- Node embeddings and exact low-rank representations of complex networks. Advances in Neural Information Processing Systems, 33:13185–13198, 2020.
- Detecting the ultra low dimensionality of real networks. Nature Communications, 13(1):6096, 2022.
- The impossibility of low-rank representations for triangle-rich complex networks. Proceedings of the National Academy of Sciences, 117(11):5631–5637, 2020.
- Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7(1):12140, 2017.
- Estimating network dimension when the spectrum struggles. arXiv, preprint 2306.14266, 2023.
- A multi-omics integrative approach unravels novel genes and pathways associated with senescence escape after targeted therapy in NRAS mutant melanoma. Cancer Gene Therapy, pages 1–16, 2023.
- DDOT: a swiss army knife for investigating data-driven biological ontologies. Cell Systems, 8(3):267–273, 2019.
- A genomic timescale for placental mammal evolution. Science, 380(6643):eabl8189, 2023.
- De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nature Communications, 11(1):10, 2020.
- Jacob Yasonik. Multiobjective de novo drug design with recurrent neural networks and nondominated sorting. Journal of Cheminformatics, 12(1):14, 2020.
- Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Central Science, 4(1):120–131, 2018.
- Daria Grechishnikova. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Scientific Reports, 11(1):321, 2021.
- Deep reinforcement learning for de novo drug design. Science Advances, 4(7):eaap7885, 2018.
- Chemical pesticides and human health: the urgent need for a new concept in agriculture. Frontiers in Public Health, 4:148, 2016.
- Matt Blois. Following several fallow decades, herbicide companies are searching for new modes of action. Chemical & Engineering News, 100(22), 2022.
- Pesticide Informatics Platform (PIP): An international platform for pesticide discovery, residue, and risk evaluation. Journal of Agricultural and Food Chemistry, 70(22):6617–6623, 2022.
- CROPCSM: designing safe and potent herbicides with graph-based signatures. Briefings in Bioinformatics, 23(2):bbac042, 2022.
- Comprehensive machine learning based study of the chemical space of herbicides. Scientific Reports, 11(1):11479, 2021.
- Data-driven Protein Engineering, chapter 6, pages 133–151. John Wiley & Sons, Ltd, 2021.
- Machine learning to navigate fitness landscapes for protein engineering. Current Opinion in Biotechnology, 75:102713, 2022.
- Noel Malod-Dognin (8 papers)
- Natasa Przulj (8 papers)