Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification (2405.09756v1)

Published 16 May 2024 in cs.LG, cs.NE, and q-bio.GN

Abstract: In the relentless efforts in enhancing medical diagnostics, the integration of state-of-the-art machine learning methodologies has emerged as a promising research area. In molecular biology, there has been an explosion of data generated from multi-omics sequencing. The advent sequencing equipment can provide large number of complicated measurements per one experiment. Therefore, traditional statistical methods face challenging tasks when dealing with such high dimensional data. However, most of the information contained in these datasets is redundant or unrelated and can be effectively reduced to significantly fewer variables without losing much information. Dimensionality reduction techniques are mathematical procedures that allow for this reduction; they have largely been developed through statistics and machine learning disciplines. The other challenge in medical datasets is having an imbalanced number of samples in the classes, which leads to biased results in machine learning models. This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features, and Generative Adversarial Networks (GAN) to generate synthetic samples. Latent space is the reduced dimensional space that captures the meaningful features of the original data. Our model starts with feature selection to select the discriminative features before feeding them to the neural network. Then, the model predicts the outcome of cancer for different datasets. The proposed model outperformed other existing models by scoring accuracy of 95.09% for bladder cancer dataset and 88.82% for the breast cancer dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Toward multiomics-based next-generation diagnostics for precision medicine. Personalized Medicine, 16(2):157–170, 2019.
  2. Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer. Algorithms, 17(1):13, 2023.
  3. PaCMAP-embedded convolutional neural network for multi-omics data integration. Heliyon, 10(1), 2024.
  4. NMF-guided feature selection and genetic algorithm-driven framework for tumor mutational burden classification in bladder cancer using multi-omics data. Network Modeling Analysis in Health Informatics and Bioinformatics, 13(1):26, 2024. https://doi.org/10.1007/s13721-024-00460-7
  5. Learning Internal Representations by Error Propagation, Parallel Distributed Processing, Explorations in the Microstructure of Cognition, ed. DE Rumelhart and J. McClelland. Vol. 1. 1986. Biometrika, 71:599–607, 1986.
  6. Fatos Xhafa. Machine Learning, Big Data, and IoT for Medical Informatics. Academic Press, 2021.
  7. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence, pages 315–325. Springer, 2000.
  8. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  9. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pages 1322–1328. Ieee, 2008.
  10. Generative adversarial nets. In Advances in neural information processing systems, volume 27, 2014.
  11. Comprehensive molecular portraits of invasive lobular breast cancer. Cell, 163(2):506–519, 2015.
  12. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 6(269):pl1–pl1, 2013.
  13. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer discovery, 2(5):401–404, 2012.
  14. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  15. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com