Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification

Published 16 May 2024 in cs.LG, cs.NE, and q-bio.GN | (2405.09756v1)

Abstract: In the relentless efforts in enhancing medical diagnostics, the integration of state-of-the-art machine learning methodologies has emerged as a promising research area. In molecular biology, there has been an explosion of data generated from multi-omics sequencing. The advent sequencing equipment can provide large number of complicated measurements per one experiment. Therefore, traditional statistical methods face challenging tasks when dealing with such high dimensional data. However, most of the information contained in these datasets is redundant or unrelated and can be effectively reduced to significantly fewer variables without losing much information. Dimensionality reduction techniques are mathematical procedures that allow for this reduction; they have largely been developed through statistics and machine learning disciplines. The other challenge in medical datasets is having an imbalanced number of samples in the classes, which leads to biased results in machine learning models. This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features, and Generative Adversarial Networks (GAN) to generate synthetic samples. Latent space is the reduced dimensional space that captures the meaningful features of the original data. Our model starts with feature selection to select the discriminative features before feeding them to the neural network. Then, the model predicts the outcome of cancer for different datasets. The proposed model outperformed other existing models by scoring accuracy of 95.09% for bladder cancer dataset and 88.82% for the breast cancer dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Toward multiomics-based next-generation diagnostics for precision medicine. Personalized Medicine, 16(2):157–170, 2019.
  2. Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer. Algorithms, 17(1):13, 2023.
  3. PaCMAP-embedded convolutional neural network for multi-omics data integration. Heliyon, 10(1), 2024.
  4. NMF-guided feature selection and genetic algorithm-driven framework for tumor mutational burden classification in bladder cancer using multi-omics data. Network Modeling Analysis in Health Informatics and Bioinformatics, 13(1):26, 2024. https://doi.org/10.1007/s13721-024-00460-7
  5. Learning Internal Representations by Error Propagation, Parallel Distributed Processing, Explorations in the Microstructure of Cognition, ed. DE Rumelhart and J. McClelland. Vol. 1. 1986. Biometrika, 71:599–607, 1986.
  6. Fatos Xhafa. Machine Learning, Big Data, and IoT for Medical Informatics. Academic Press, 2021.
  7. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence, pages 315–325. Springer, 2000.
  8. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  9. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pages 1322–1328. Ieee, 2008.
  10. Generative adversarial nets. In Advances in neural information processing systems, volume 27, 2014.
  11. Comprehensive molecular portraits of invasive lobular breast cancer. Cell, 163(2):506–519, 2015.
  12. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 6(269):pl1–pl1, 2013.
  13. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer discovery, 2(5):401–404, 2012.
  14. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  15. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.