Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 34 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Identifying and examining machine learning biases on Adult dataset (2310.09373v1)

Published 13 Oct 2023 in cs.CY and cs.LG

Abstract: This research delves into the reduction of machine learning model bias through Ensemble Learning. Our rigorous methodology comprehensively assesses bias across various categorical variables, ultimately revealing a pronounced gender attribute bias. The empirical evidence unveils a substantial gender-based wage prediction disparity: wages predicted for males, initially at \$902.91, significantly decrease to \$774.31 when the gender attribute is alternated to females. Notably, Kullback-Leibler divergence scores point to gender bias, with values exceeding 0.13, predominantly within tree-based models. Employing Ensemble Learning elucidates the quest for fairness and transparency. Intriguingly, our findings reveal that the stacked model aligns with individual models, confirming the resilience of model bias. This study underscores ethical considerations and advocates the implementation of hybrid models for a data-driven society marked by impartiality and inclusivity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. A statistical approach to adult census income level prediction, 2018. URL https://api.semanticscholar.org/CorpusID:53086066.
  2. A machine learning model with human cognitive biases capable of learning from small and biased datasets. Scientific Reports 2018 8:1, 8:1–13, 5 2018. ISSN 2045-2322. doi:10.1038/s41598-018-25679-z. URL https://www.nature.com/articles/s41598-018-25679-z.
  3. Krishna Gade. The never-ending issues around ai and bias. who’s to blame when ai goes wrong? | fiddler ai blog, 11 2019. URL https://www.fiddler.ai/blog/the-never-ending-issues-around-ai-and-bias-whos-to-blame-when-ai-goes-wrong.
  4. Chris Piech. Fairness in artificial intelligence, 2020. URL https://chrispiech.github.io/probabilityForComputerScientists/en/examples/fairness/.
  5. Hershey H. Friedman. Cognitive biases that interfere with critical thinking and scientific reasoning: A course module. SSRN Electronic Journal, 6 2017. doi:10.2139/SSRN.2958800.
  6. Salem Alelyani. Detection and evaluation of machine learning bias. Applied Sciences (Switzerland), 11, 7 2021. ISSN 20763417. doi:10.3390/app11146271.
  7. Towards a pragmatist dealing with algorithmic bias in medical machine learning. Medicine, Health Care and Philosophy 2021 24:3, 24:341–349, 3 2021. ISSN 1572-8633. doi:10.1007/S11019-021-10008-5. URL https://link.springer.com/article/10.1007/s11019-021-10008-5.
  8. The evolution of cognitive bias. The Handbook of Evolutionary Psychology, pages 1–20, 11 2015. doi:10.1002/9781119125563.EVPSYCH241. URL https://onlinelibrary.wiley.com/doi/full/10.1002/9781119125563.evpsych241https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119125563.evpsych241https://onlinelibrary.wiley.com/doi/10.1002/9781119125563.evpsych241.
  9. A reductions approach to fair classification, 2018.
  10. Patient safety and quality improvement: Ethical principles for a regulatory approach to bias in healthcare machine learning. Journal of the American Medical Informatics Association, 27:2024–2027, 12 2020. ISSN 1527974X. doi:10.1093/JAMIA/OCAA085. URL https://academic.oup.com/jamia/article/27/12/2024/5862600.
  11. Actionable auditing investigating the impact of publicly naming biased performance results of commercial ai products. AIES, 1 2019. doi:10.1145/3306618.3314244. URL https://doi.org/10.1145/3306618.3314244.
  12. A framework for understanding sources of harm throughout the machine learning life cycle. ACM International Conference Proceeding Series, 10 2021. doi:10.1145/3465416.3483305.
  13. Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of hr recruitment and hr development. Business Research, 13:795–848, 11 2020. ISSN 21982627. doi:10.1007/S40685-020-00134-W/FIGURES/2. URL https://link.springer.com/article/10.1007/s40685-020-00134-w.
  14. A review of possible effects of cognitive biases on interpretation of rule-based machine learning models. Artificial Intelligence, 295:103458, 6 2021. ISSN 0004-3702. doi:10.1016/J.ARTINT.2021.103458.
  15. A survey on bias and fairness in machine learning. ACM Comput. Surv, 54, 2021. doi:10.1145/3457607. URL https://doi.org/10.1145/3457607.
  16. Bias in bios: A case study of semantic representation bias in a high-stakes setting. FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency, pages 120–128, 1 2019. doi:10.1145/3287560.3287572. URL https://arxiv.org/abs/1901.09451v1.
  17. J. Howard and S. Gugger. Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD. O’Reilly Media, Incorporated, 2020. ISBN 9781492045526. URL https://books.google.no/books?id=xd6LxgEACAAJ.
  18. Race/ethnic difference in diabetes and diabetic complications. Current diabetes reports, 13:814–823, 12 2013. ISSN 1539-0829. doi:10.1007/S11892-013-0421-9. URL https://pubmed.ncbi.nlm.nih.gov/24037313/.
  19. Racial and ethnic differences in the relationship between hba1c and blood glucose: implications for the diagnosis of diabetes. The Journal of clinical endocrinology and metabolism, 97:1067–1072, 4 2012. ISSN 1945-7197. doi:10.1210/JC.2011-1894. URL https://pubmed.ncbi.nlm.nih.gov/22238408/.
  20. Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site. Psychiatry Research: Neuroimaging, 314:111313, 2021. doi:10.1016/j.pscychresns.2021.111313. URL https://doi.org/10.1016/j.pscychresns.2021.111313.
  21. Rishabh Kumar. The evolution of wealth-income ratios in India, 1860-2012, 2019. URL https://ideas.repec.org/p/hal/wpaper/hal-02876998.html.
  22. Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks. Neural Computing and Applications, 31:6893–6908, 10 2019. ISSN 14333058. doi:10.1007/S00521-018-3523-0. URL https://towardsdatascience.com/can-you-predict-if-a-customer-will-make-a-purchase-on-a-website-e6843ec264ae.
  23. A unified approach to interpreting model predictions, 2017.
  24. A decision-making framework for precision marketing. Expert Systems with Applications, 42:3357–3367, 5 2015. ISSN 0957-4174. doi:10.1016/J.ESWA.2014.12.022.
  25. Celeb Reigada. Clustering(dbscan)+customer response prediction, version 3, 11 2021. URL https://www.kaggle.com/calebreigada/clustering-dbscan-customer-response-prediction?scriptVersionId=80360862.
  26. Tracy Nham. Classifying income from 1994 census data, 1998. URL https://cseweb.ucsd.edu/classes/sp15/cse190-c/reports/sp15/024.pdf.
  27. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias, 2018.
  28. Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine, 169:866–872, 12 2018. ISSN 15393704. doi:10.7326/M18-1990.
  29. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in Neural Information Processing Systems, 32, 2019.
  30. Uci machine learning repository: Adult data set, 1994. URL https://archive.ics.uci.edu/ml/datasets/adult.
  31. Explainable and non-explainable discrimination in classification. Studies in Applied Philosophy, Epistemology and Rational Ethics, 3:155–170, 2013. ISSN 21926263. doi:10.1007/978-3-642-30487-3_8. URL https://link.springer.com/chapter/10.1007/978-3-642-30487-3_8.
  32. Us census demographic data | kaggle, 2018. URL https://www.kaggle.com/muonneutrino/us-census-demographic-data.
  33. India census 2011 | kaggle, 2017. URL https://www.kaggle.com/danofer/india-census.
  34. Covid-19 case surveillance public use dataset | kaggle, 2020. URL https://www.kaggle.com/arashnic/covid19-case-surveillance-public-use-dataset.
  35. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 3149–3157, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  36. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, aug 2016. doi:10.1145/2939672.2939785. URL https://doi.org/10.1145%2F2939672.2939785.
  37. Catboost: gradient boosting with categorical features support, 2018.
  38. Tabnet: Attentive interpretable tabular learning, 2020.
  39. Identifying and correcting label bias in machine learning, 2019.
  40. L. Legrand and E. Grivel. Jeffrey’s divergence between autoregressive processes disturbed by additive white noises. Signal Processing, 149:162–178, 8 2018. ISSN 01651684. doi:10.1016/J.SIGPRO.2018.03.017.
  41. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011a. URL https://scikit-learn.org/stable/about.html#citing-scikit-learn.
  42. On detection of outliers and their effect in supervised classification, 11 2004. URL https://www.researchgate.net/publication/228965221_On_Detection_Of_Outliers_And_Their_Effect_In_Supervised_Classification.
  43. Machine learning-based missing value imputation method for clinical datasets. Lecture Notes in Electrical Engineering, 229 LNEE:245–257, 1 2013. ISSN 18761119. doi:10.1007/978-94-007-6190-2_19.
  44. Lei Chen. Curse of dimensionality. Encyclopedia of Database Systems, pages 545–546, 2009. doi:10.1007/978-0-387-39940-9_133. URL https://link.springer.com/referenceworkentry/10.1007/978-0-387-39940-9_133.
  45. Optuna: A next-generation hyperparameter optimization framework. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2623–2631, 7 2019. doi:10.48550/arxiv.1907.10902. URL https://arxiv.org/abs/1907.10902v1.
  46. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011b.
  47. Ekaba Bisong. Google colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform, pages 59–64, 2019. doi:10.1007/978-1-4842-4470-8_7. URL https://link.springer.com/chapter/10.1007/978-1-4842-4470-8_7.
  48. John D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science and Engineering, 9:90–95, 2007. ISSN 15219615. doi:10.1109/MCSE.2007.55.
  49. Michael Waskom. seaborn: statistical data visualization. Journal of Open Source Software, 6:3021, 4 2021. doi:10.21105/JOSS.03021.
  50. kutschkem. Linear regression vs random forest performance accuracy - stack overflow, 6 2018. URL https://stackoverflow.com/questions/51037363/linear-regression-vs-random-forest-performance-accuracy.
  51. Hypothesis testing, type i and type ii errors. Industrial Psychiatry Journal, 18:127, 7 2009. ISSN 0972-6748. doi:10.4103/0972-6748.62274. URL /pmc/articles/PMC2996198//pmc/articles/PMC2996198/?report=abstracthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996198/.
  52. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 797–806, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450348874. doi:10.1145/3097983.3098095. URL https://doi.org/10.1145/3097983.3098095.
  53. Mianzhi Wang. A brief list of statistical divergences, 9 2018. URL https://research.wmz.ninja/articles/2018/03/a-brief-list-of-statistical-divergences.html.
  54. Casper Hansen. Stack machine learning models: Get better results - ibm developer, 1 2020. URL https://developer.ibm.com/articles/stack-machine-learning-models-get-better-results/.
  55. Yenwee Lim. Stacked ensembles — improving model performance on a higher level | by yenwee lim | mar, 2022 | towards data science, 3 2022. URL https://towardsdatascience.com/stacked-ensembles-improving-model-performance-on-a-higher-level-99ffc4ea5523.
  56. Francois Chollet. Keras, 2015. URL https://keras.io/.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)