Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Machine Learning Driven Biomarker Selection for Medical Diagnosis (2405.10345v1)

Published 16 May 2024 in q-bio.QM, cs.AI, and cs.LG

Abstract: Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations. In this study, we evaluate 4 different methods for biomarker selection and 4 different ML classifiers for identifying correlations, evaluating 16 approaches in all. We found that contemporary methods outperform previously reported logistic regression in cases where 3 and 10 biomarkers are permitted. When specificity is fixed at 0.9, ML approaches produced a sensitivity of 0.240 (3 biomarkers) and 0.520 (10 biomarkers), while standard logistic regression provided a sensitivity of 0.000 (3 biomarkers) and 0.040 (10 biomarkers). We also noted that causal-based methods for biomarker selection proved to be the most performant when fewer biomarkers were permitted, while univariate feature selection was the most performant when a greater number of biomarkers were permitted.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Chapter Four - Advances in biomarker detection: Alternative approaches for blood-based biomarker detection. Advances in Clinical Chemistry. 2019; 92(4):141-199.
  2. Electrochemical Biosensors for Cancer Biomarkers Detection: Recent Advances and Challenges. Electroanalysis. 2016; 28:1402-1419.
  3. Blennow K, Zetterberg H. Biomarkers for Alzheimer’s disease: current status and prospects for the future. J Intern Med. 2018; 284:643–663.
  4. Detection of Circulating Tumor Cells and Their Implications as a Biomarker for Diagnosis, Prognostication, and Therapeutic Monitoring in Hepatocellular Carcinoma. Hepatology 73(1):p 422-436, January 2021.
  5. Discovery of biomarkers for gastric cancer: A proteomics approach. Journal of Proteomics. 2012; 75(11):3081-3097.
  6. Purposeful selection of variables in logistic regression. Source Code Biol Med 3, 17 (2008).
  7. Prevalence and Predictive Factors of Sexual Dysfunction in Iranian Women: Univariate and Multivariate Logistic Regression Analyses. Korean J Fam Med. 2016 Sep;37(5):293-8.
  8. Islam M, Islam R. Exploring the Impact of Univariate Feature Selection Method on Machine Learning Algorithms for Heart Disease Prediction. 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM), Gazipur, Bangladesh. 2023;
  9. Kleinberg S, Hripcsak G. A review of causal inference for biomedical informatics. Journal of Biomedical Informatics. 2011; 44(6):1102-1112.
  10. Helicobacter pylori Immunoproteomic Profiles in Gastric Cancer. Journal of Proteome Res. 2021; 20(1):409–419.
  11. Selecting the best machine learning algorithm to support the diagnosis of Non-Alcoholic Fatty Liver Disease: A meta learner study. PLOS ONE, Public Library of Science. 2020; 15(10).
  12. An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders. BMC Bioinformatics. 2019; 20:491.
  13. Yadav S. S, Jadhav S. M. Deep convolutional neural network based medical image classification for disease diagnosis. J Big Data. 2019; 6:113.
  14. A convolutional neural network for the screening and staging of diabetic retinopathy. PLOS ONE. 2020; 15(6).
  15. Multi- class classification of breast cancer abnormalities using Deep Convolutional Neural Network (CNN). PLOS ONE. 2021; 16(8).
  16. Automatic diagnostics of tuberculosis using convolutional neural networks analysis of MODS digital images. PLOS ONE. 2019; 14(2).
  17. Pneumonia detection in chest X-ray images using an ensemble of deep learning models. PLOS ONE. 2021; 16(9).
  18. Kleinberg S, Mishra B. The temporal logic of causal structures. arXiv preprint arXiv:1205.2634. 2012 May 9.
  19. Improving the accuracy of medical diagnosis with causal machine learning. Nat Commun. 2020 Aug 11;11(1):3923.
  20. Logistic regression. New York: Springer-Verlag; 2002 Aug.
  21. Helicobacter pylori immunoproteomic profiles in gastric cancer. Journal of Proteome Research. 2021 Jan 1;20(1):409-19.
  22. Gardner M. W, Dorling S. R. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment. 1998 Aug 1;32(14-15):2627-36.
  23. Scikit-learn: Machine Learning in Python. JMLR. 2011; 12(1):2825-2830.
  24. Xgboost: extreme gradient boosting. R package version 0.4-2. 2015; 1(4):1-4.
  25. Detection of Infectious Disease using Non-Invasive Logistic Regression Technique. IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS). 2019; 1-5
  26. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc. 2019; 8024-8035
  27. Breiman L. Random Forests. Machine Learning. 2001; 45:5-32
  28. Chen T, Guestrin C. A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 Aug 13.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com