Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Comparison of Machine Learning Classification Algorithms and Application to the Framingham Heart Study (2402.15005v1)

Published 22 Feb 2024 in cs.LG and stat.ML

Abstract: The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and compound during the problem selection, data collection, and outcome definition, this research pertains to some generalizability impediments that occur during the development and the post-deployment of machine learning classification algorithms. Using the Framingham coronary heart disease data as a case study, we show how to effectively select a probability cutoff to convert a regression model for a dichotomous variable into a classifier. We then compare the sampling distribution of the predictive performance of eight machine learning classification algorithms under four training/testing scenarios to test their generalizability and their potential to perpetuate biases. We show that both the Extreme Gradient Boosting, and Support Vector Machine are flawed when trained on an unbalanced dataset. We introduced and show that the double discriminant scoring of type I is the most generalizable as it consistently outperforms the other classification algorithms regardless of the training/testing scenario. Finally, we introduce a methodology to extract an optimal variable hierarchy for a classification algorithm, and illustrate it on the overall, male and female Framingham coronary heart disease data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Williams P; Kind E. Data-driven Policing: The hardwiring of discriminatory policing practices across Europe. Brussels, Belgium: European Network Against Racism (ENAR) 2019.
  2. Klein A. Reducing bias in AI-based financial services. In: Brookings Institute 2020.
  3. Ferryman K; Winn RA. Artificial intelligence can entrench disparities – Here’s what we must do. The Cancer Letter 2018, Nov. 16. https://cancerletter.com/articles/20181116_1/.
  4. Ghassemi M; Naumann T; SchulamP; BeamAL; ChenIY; Ranganath R. Practical guidance on artificial intelligence for health-care data. Lancet Digital Health 1 2019, 157–59.
  5. Ghassemi M; Naumann T; Schulam P; Beam A.L; ChenI Y; Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Summits Transl. Sci. Proc. 2020, 191–200.
  6. Farooq F; Mogayzel P.J; Lanzkron S; Haywood C; Strouse J.J. Comparison of US federal and foundation funding of research for sickle cell disease and cystic fibrosis and factors associated with research productivity. JAMA Netw Open 2020, 3(3):e201737.
  7. Rothwell PM. .External validity of randomised controlled trials: “To whom do the results of this trial apply?” Lancet 2005, 365, 82–93.
  8. Ferryman K; Pitcan M. Fairness in precision medicine. Res. Proj., Data & Society 2018. https://datasociety.net/research/fairness-precision-medicine/
  9. Abebe R; Hill S; Vaughan JW; Small PM; Schwartz HA. Using search queries to understand health information needs in Africa. In Proceedings of the Thirteenth International AAAI Conference on Web and Social Media 2019, 3–14.
  10. James S; Herman J; Rankin S; Keisling M; Mottet L; Anafi M. The report of the 2015 US transgender survey. Washington, DC: Natl. Cent. Transgend. Equal. 2016.
  11. Joshi S; Koyejo O; Kim B; Ghosh J. xGEMS: generating examplars to explain black-box models. arXiv:1806.08867 [cs.LG] 2018.
  12. Caruana R; Lou Y; Gehrke J; Koch P; Sturm M; Elhadad N. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2015, 1721–30. New York: Assoc. Comput. Mach.
  13. Zech J.R; Badgeley M.A; Liu M; Costa A.B; Titano J.J; Oermann E.K. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLOS Med. 2018, 15:e1002683.
  14. Seyyed-Kalantari L; Liu G; McDermott M; Ghassemi M. CheXclusion: fairness gaps in deep chest X-ray classifiers. arXiv:2003.00827 [cs.CV] 2020.
  15. Cox F.F. An introduction to multivariate data analysis. Oxford University Press 2005, New York.
  16. Github. https://github.com/dmlc/xgboost
  17. Ho T.K. The Random Subspace Method for Constructing Decision Forests" (PDF).IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20 (8): 832–44.
  18. Bellman R.E. Dynamic Programming. Dover 1957.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)