Supervised Multiple Kernel Learning approaches for multi-omics data integration (2403.18355v2)
Abstract: Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
- Data integration in the era of omics: current and future challenges. BMC Systems Biology, 8(Suppl 2):I1, 2014. doi:10.1186/1752-0509-8-s2-i1. URL https://doi.org/10.1186/1752-0509-8-s2-i1.
- Kernel principal component analysis. In Artificial Neural Networks — ICANN’97, pages 583–588, Berlin, Heidelberg, 1997. Springer Berlin Heidelberg.
- Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 03 2003. doi:10.1162/153244303768966085.
- Nonlinear discriminant analysis using kernel functions. In NIPS, pages 568–574, 1999a.
- M. Girolami. Mercer kernel-based clustering in feature space. IEEE Transactions on Neural Networks, 13(3):780–784, May 2002. doi:10.1109/tnn.2002.1000150. URL https://doi.org/10.1109/tnn.2002.1000150.
- Kernel partial least squares regression in reproducing kernel hilbert space. Journal of machine learning research, 2(Dec):97–123, 2001.
- Nonlinear discriminant analysis using kernel functions. Advances in neural information processing systems, 12, 1999b.
- Stacked autoencoder based multi-omics data integration for cancer survival prediction, 2022. URL https://arxiv.org/abs/2207.04878.
- Tianwei Yu. Aime: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments. PLOS Computational Biology, 18(1):e1009826, January 2022. ISSN 1553-7358. doi:10.1371/journal.pcbi.1009826. URL http://dx.doi.org/10.1371/journal.pcbi.1009826.
- Supreme: multiomics data integration using graph convolutional networks. NAR Genomics and Bioinformatics, 5(2), March 2023. ISSN 2631-9268. doi:10.1093/nargab/lqad063. URL http://dx.doi.org/10.1093/nargab/lqad063.
- MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nature Communications, 12(1), June 2021. doi:10.1038/s41467-021-23774-w. URL https://doi.org/10.1038/s41467-021-23774-w.
- Multi-omics integration method based on attention deep learning network for biomedical data classification. Computer Methods and Programs in Biomedicine, 231:107377, April 2023. doi:10.1016/j.cmpb.2023.107377. URL https://doi.org/10.1016/j.cmpb.2023.107377.
- A deep learning approach to multiple kernel fusion. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2292–2296. IEEE, 2017. doi:10.1109/icassp.2017.7952565. URL http://dx.doi.org/10.1109/ICASSP.2017.7952565.
- Multiple-kernel learning for genomic data mining and prediction. BMC Bioinformatics, 20(1), August 2019. doi:10.1186/s12859-019-2992-1. URL https://doi.org/10.1186/s12859-019-2992-1.
- Integration strategies of multi-omics data for machine learning analysis. Computational and Structural Biotechnology Journal, 19:3735–3746, 2021. ISSN 2001-0370. doi:https://doi.org/10.1016/j.csbj.2021.06.030. URL https://www.sciencedirect.com/science/article/pii/S2001037021002683.
- More is better: Recent progress in multi-omics data integration methods. Frontiers in Genetics, 8, June 2017. ISSN 1664-8021. doi:10.3389/fgene.2017.00084. URL http://dx.doi.org/10.3389/fgene.2017.00084.
- Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLOS Computational Biology, 19(7):e1011224, July 2023. ISSN 1553-7358. doi:10.1371/journal.pcbi.1011224. URL http://dx.doi.org/10.1371/journal.pcbi.1011224.
- The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature, 490(7418):61–70, September 2012. ISSN 1476-4687. doi:10.1038/nature11412. URL http://dx.doi.org/10.1038/nature11412.
- Plankton networks driving carbon export in the oligotrophic ocean. Nature, 532(7600):465–470, February 2016. ISSN 1476-4687. doi:10.1038/nature16942. URL http://dx.doi.org/10.1038/nature16942.
- Xinhua Zhang. Kernel Methods, chapter 1, page 566–570. Springer US, 2011. ISBN 9780387301648. doi:10.1007/978-0-387-30164-8_430. URL http://dx.doi.org/10.1007/978-0-387-30164-8_430.
- Kernel methods for large-scale genomic data analysis. Briefings in Bioinformatics, 16(2):183–192, July 2014. ISSN 1477-4054. doi:10.1093/bib/bbu024. URL http://dx.doi.org/10.1093/bib/bbu024.
- Kernel-Based Integration of Genomic Data Using Semidefinite Programming, chapter 1, page 231–260. The MIT Press, July 2004. ISBN 9780262256926. doi:10.7551/mitpress/4057.003.0015. URL http://dx.doi.org/10.7551/mitpress/4057.003.0015.
- SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008. URL https://hal.science/hal-00218338.
- Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 1175–1182, Madison, WI, USA, 2010. Omnipress. ISBN 9781605589077.
- Spicymkl: a fast algorithm for multiple kernel learning with thousands of kernels. Machine Learning, 85(1–2):77–108, June 2011. ISSN 1573-0565. doi:10.1007/s10994-011-5252-9. URL http://dx.doi.org/10.1007/s10994-011-5252-9.
- Multilevel heterogeneous omics data integration with kernel fusion. Briefings in Bioinformatics, November 2018. doi:10.1093/bib/bby115. URL https://doi.org/10.1093/bib/bby115.
- Localized multiple kernel learning. In Proceedings of the 25th international conference on Machine learning - ICML '08, page 352–359. ACM Press, 2008. doi:10.1145/1390156.1390201. URL https://doi.org/10.1145/1390156.1390201.
- Localized algorithms for multiple kernel learning. Pattern Recognition, 46(3):795–807, March 2013. ISSN 0031-3203. doi:10.1016/j.patcog.2012.09.002. URL http://dx.doi.org/10.1016/j.patcog.2012.09.002.
- Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015, October 2017. doi:10.1093/bioinformatics/btx682. URL https://doi.org/10.1093/bioinformatics/btx682.
- Improvement of variables interpretability in kernel PCA. BMC Bioinformatics, 24(1), July 2023. doi:10.1186/s12859-023-05404-y. URL https://doi.org/10.1186/s12859-023-05404-y.
- Multimodal deep learning for biomedical data fusion: a review. Briefings in Bioinformatics, 23(2), January 2022. ISSN 1477-4054. doi:10.1093/bib/bbab569. URL http://dx.doi.org/10.1093/bib/bbab569.
- A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinformatics, 20(1), October 2019. ISSN 1471-2105. doi:10.1186/s12859-019-3116-7. URL http://dx.doi.org/10.1186/s12859-019-3116-7.
- Classifying breast cancer subtypes using deep neural networks based on multi-omics data. Genes, 11(8):888, August 2020. ISSN 2073-4425. doi:10.3390/genes11080888. URL http://dx.doi.org/10.3390/genes11080888.
- Moli: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics, 35(14):i501–i509, July 2019. ISSN 1367-4811. doi:10.1093/bioinformatics/btz318. URL http://dx.doi.org/10.1093/bioinformatics/btz318.
- Multi-omics data integration using cross-modal neural networks. In The European Symposium on Artificial Neural Networks, pages 385,390, 2018. URL https://api.semanticscholar.org/CorpusID:53239415.
- Tongxue Zhou. Modality-level cross-connection and attentional feature fusion based deep neural network for multi-modal brain tumor segmentation. Biomedical Signal Processing and Control, 81:104524, March 2023. ISSN 1746-8094. doi:10.1016/j.bspc.2022.104524. URL http://dx.doi.org/10.1016/j.bspc.2022.104524.
- Adam: A method for stochastic optimization, 2014. URL https://arxiv.org/abs/1412.6980.
- Deep neural networks and tabular data: A survey. CoRR, abs/2110.01889, 2021. URL https://arxiv.org/abs/2110.01889.