Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects (2311.14214v1)
Abstract: Data science projects often involve various ML methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,” ACM computing surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021.
- R. Fu, Y. Huang, and P. V. Singh, “Ai and algorithmic bias: Source, detection, mitigation and implications,” Detection, Mitigation and Implications (July 26, 2020), 2020.
- D. Pessach and E. Shmueli, “A review on fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 55, no. 3, pp. 1–44, 2022.
- T. P. Pagano, R. B. Loureiro, F. V. N. Lisboa, G. O. R. Cruz, R. M. Peixoto, G. A. d. S. Guimarães, L. L. d. Santos, M. M. Araujo, M. Cruz, E. L. S. de Oliveira et al., “Bias and unfairness in machine learning models: a systematic literature review,” arXiv preprint arXiv:2202.08176, 2022.
- T. Fahse, V. Huber, and B. van Giffen, “Managing bias in machine learning projects,” in Innovation Through Information Systems: Volume II: A Collection of Latest Research on Technology Issues. Springer, 2021, pp. 94–109.
- P. Adchariyavivit, A. Crow, and V. Golden, “Tools for identifying ai biases for machine learning models,” 2021.
- J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346–2363, 2018.
- W. Hummer, V. Muthusamy, T. Rausch, P. Dube, K. El Maghraoui, A. Murthi, and P. Oum, “Modelops: Cloud-based lifecycle management for reliable and trusted ai,” in 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2019, pp. 113–120.
- P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth et al., “Crisp-dm 1.0: Step-by-step data mining guide,” SPSS inc, vol. 9, no. 13, pp. 1–73, 2000.
- R. P. França, A. C. B. Monteiro, R. Arthur, and Y. Iano, “An overview of the machine learning applied in smart cities,” Smart cities: A data analytics perspective, pp. 91–111, 2021.
- Q. Yao, M. Wang, Y. Chen, W. Dai, Y.-F. Li, W.-W. Tu, Q. Yang, and Y. Yu, “Taking human out of learning applications: A survey on automated machine learning,” arXiv preprint arXiv:1810.13306, 2018.
- S. Apel, D. Batory, C. Kästner, G. Saake, S. Apel, D. Batory, C. Kästner, and G. Saake, “Basic concepts, classification, and quality criteria,” Feature-Oriented Software Product Lines: Concepts and Implementation, pp. 47–63, 2013.
- A. Valdezate, R. Capilla, J. Crespo, and R. Barber, “Ruva: A runtime software variability algorithm,” IEEE Access, 2022.
- A. Caplinskas et al., “A variability model for query optimizers,” in Databases and Information Systems VII: Selected Papers from the Tenth International Baltic Conference, DB & IS 2012, vol. 249. IOS Press, 2013, p. 15.
- S. Bouarar, S. Jean, and N. Siegmund, “Spl driven approach for variability in database design,” in Model and Data Engineering. Springer, 2015, pp. 332–342.
- T. Berger, D. Lettner, J. Rubin, P. Grünbacher, A. Silva, M. Becker, M. Chechik, and K. Czarnecki, “What is a feature? a qualitative study of features in industrial software product lines,” in Proceedings of the 19th international conference on software product line, 2015, pp. 16–25.
- S. Apel, D. Batory, C. Kästner, G. Saake, S. Apel, D. Batory, C. Kästner, and G. Saake, “A development process for feature-oriented product lines,” Feature-Oriented Software Product Lines: Concepts and Implementation, pp. 17–44, 2013.
- J. Stoyanovich, B. Howe, and H. Jagadish, “Responsible data management,” Proceedings of the VLDB Endowment, vol. 13, no. 12, 2020.
- G. Symeonidis, E. Nerantzis, A. Kazakis, and G. A. Papakostas, “Mlops-definitions, tools and challenges,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0453–0460.
- J. R. Martínez, “Scalable architecture for automating machine learning model monitoring,” Ph.D. dissertation, KTH Royal Institute of Technology, Stockholm, Sweden, 2020.
- N. Nascimento, P. Alencar, and D. Cowan, “Context-aware data analytics variability in iot neural network-based systems,” in 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021, pp. 3595–3600.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- P. C. Ncr, J. Clinton, R. K. Ncr, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth, “Crisp-dm 1.0,” 1999.
- M. J. Ariza-Garzón, J. Arroyo, A. Caparrini, and M.-J. Segovia-Vargas, “Explainability of a machine learning granting scoring model in peer-to-peer lending,” Ieee Access, vol. 8, pp. 64 873–64 890, 2020.
- E. Zihni, V. I. Madai, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey, “Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome,” Plos one, vol. 15, no. 4, p. e0231166, 2020.
- R. Leenings, N. R. Winter, L. Plagwitz, V. Holstein, J. Ernsting, K. Sarink, L. Fisch, J. Steenweg, L. Kleine-Vennekate, J. Gebker et al., “Photonai—a python api for rapid machine learning model development,” Plos one, vol. 16, no. 7, p. e0254062, 2021.
- D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC medical informatics and decision making, vol. 20, no. 1, pp. 1–16, 2020.