Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting loss-of-function impact of genetic mutations: a machine learning approach (2402.00054v1)

Published 26 Jan 2024 in q-bio.GN, cs.LG, and stat.AP

Abstract: The innovation of next-generation sequencing (NGS) techniques has significantly reduced the price of genome sequencing, lowering barriers to future medical research; it is now feasible to apply genome sequencing to studies where it would have previously been cost-inefficient. Identifying damaging or pathogenic mutations in vast amounts of complex, high-dimensional genome sequencing data may be of particular interest to researchers. Thus, this paper's aims were to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores (which measure a gene's intolerance to loss-of-function mutations). These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation. Models were built using the univariate feature selection technique f-regression combined with K-nearest neighbors (KNN), Support Vector Machine (SVM), Random Sample Consensus (RANSAC), Decision Trees, Random Forest, and Extreme Gradient Boosting (XGBoost). These models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance. The findings of this study include the training of multiple models with testing set r-squared values of 0.97.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. C. Caudai, A. Galizia, F. Geraci, L. Le Pera, V. Morea, E. Salerno, A. Via, and T. Colombo, “Ai applications in functional genomics,” Computational and Structural Biotechnology Journal, vol. 19, pp. 5762–5790, 2021.
  2. H. A. Shihab, M. F. Rogers, J. Gough, M. Mort, D. N. Cooper, I. N. Day, T. R. Gaunt, and C. Campbell, “An integrative approach to predicting the functional effects of non-coding and coding sequence variation,” Bioinformatics, vol. 31, no. 10, pp. 1536–1543, 2015.
  3. C. Li, D. Zhi, K. Wang, and X. Liu, “Metarnn: differentiating rare pathogenic and rare benign missense snvs and indels using deep learning,” Genome Medicine, vol. 14, no. 1, p. 115, 2022.
  4. P. Evans, C. Wu, A. Lindy, D. A. McKnight, M. Lebo, M. Sarmady, and A. N. Abou Tayoun, “Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets,” Genome Research, vol. 29, no. 7, pp. 1144–1151, 2019.
  5. A. C. Gunning, V. Fryer, J. Fasham, A. H. Crosby, S. Ellard, E. L. Baple, and C. F. Wright, “Assessing performance of pathogenicity predictors using clinically relevant variant datasets,” Journal of medical genetics, 2020.
  6. L. Gerasimavicius, B. J. Livesey, and J. A. Marsh, “Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure,” Nature communications, vol. 13, no. 3895, Jul. 2022.
  7. J. Fadista, N. Oskolkov, O. Hansson, and L. Groop, “Loftool: a gene intolerance score based on loss-of-function variants in 60 706 individuals,” Bioinformatics, vol. 33, no. 4, pp. 471–474, Aug. 2016.
  8. P. Tripathi, S. Agarwal, A. N. Sarangi, S. Tewari, and K. Mandal, “Genetic variation in sod1 gene promoter ins/del and its influence on oxidative stress in beta thalassemia major patients,” International Journal of Hematology-Oncology and Stem Cell Research, vol. 14, no. 2, pp. 110–117, Apr. 2020.
  9. J. Taneera, S. Dhaiban, A. K. Mohammed, D. Mukhopadhyay, H. Aljaibeji, N. Sulaiman, J. Fadista, and A. Salehi, “Gnas gene is an important regulator of insulin secretory capacity in pancreatic β𝛽\betaitalic_β-cells,” Gene, vol. 715, p. 144028, Jul. 2019.
  10. National Cancer Institute. NCI Dictionary of Genetics Terms. [Online]. Availiable: https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/vus. Accessed: Oct. 2023.
  11. F. Pargent, F. Pfisterer, J. Thomas, and B. Bischl, “Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features,” Computational Statistics, vol. 37, no. 5, pp. 2671–2692, Mar. 2022.
  12. K. K. Nicodemus and J. D. Malley, “Predictor correlation impacts machine learning algorithms: implications for genomic studies,” Bioinformatics, vol. 25, no. 15, pp. 1884–1890, 2009.
  13. The BMJ, “11. correlation and regression,” https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/11-correlation-and-regression, Oct 2020.
  14. J. Raymaekers and P. J. Rousseeuw, “Transforming variables to central normality,” Machine Learning, pp. 1–23, 2021.
  15. M. Zuliani, “Ransac for dummies,” Vision Research Lab, University of California, Santa Barbara, Oct. 2009.
  16. K. G. Derpanis, “Overview of the ransac algorithm,” Image Rochester NY, vol. 4, no. 1, pp. 2–3, May. 2010.
  17. D. Curran-Everett, “Explorations in statistics: the log transformation,” Advances in physiology education, vol. 42, no. 2, pp. 343–347, Jun. 2018.
  18. F. Changyong, W. Hongyue, L. Naiji, C. Tian, H. Hua, L. Ying et al., “Log-transformation and its implications for data analysis,” Shanghai archives of psychiatry, vol. 26, no. 2, p. 105, Apr. 2014.
  19. C. Feng, H. Wang, N. Lu, and X. M. Tu, “Log transformation: application and interpretation in biomedical research,” Statistics in medicine, vol. 32, no. 2, pp. 230–239, Jul. 2012.
  20. O. N. Keene, “The log transformation is special,” Statistics in medicine, vol. 14, no. 8, pp. 811–819, Apr. 1995.
  21. S. Weisberg, “Yeo-johnson power transformations,” Department of Applied Statistics, University of Minnesota. Retrieved June, vol. 1, p. 2003, Oct. 2001.
  22. I.-K. Yeo and R. A. Johnson, “A new family of power transformations to improve normality or symmetry,” Biometrika, vol. 87, no. 4, pp. 954–959, Dec. 2000.
  23. Scikit-Learn. 1.13. Feature Selection. [Online]. Availiable: https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/vus. Accessed: Oct. 2023.
  24. D. Berrar, “Cross-validation.” 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.