A machine learning workflow to address credit default prediction (2403.03785v1)
Abstract: Due to the recent increase in interest in Financial Technology (FinTech), applications like credit default prediction (CDP) are gaining significant industrial and academic attention. In this regard, CDP plays a crucial role in assessing the creditworthiness of individuals and businesses, enabling lenders to make informed decisions regarding loan approvals and risk management. In this paper, we propose a workflow-based approach to improve CDP, which refers to the task of assessing the probability that a borrower will default on his or her credit obligations. The workflow consists of multiple steps, each designed to leverage the strengths of different techniques featured in machine learning pipelines and, thus best solve the CDP task. We employ a comprehensive and systematic approach starting with data preprocessing using Weight of Evidence encoding, a technique that ensures in a single-shot data scaling by removing outliers, handling missing values, and making data uniform for models working with different data types. Next, we train several families of learning models, introducing ensemble techniques to build more robust models and hyperparameter optimization via multi-objective genetic algorithms to consider both predictive accuracy and financial aspects. Our research aims at contributing to the FinTech industry in providing a tool to move toward more accurate and reliable credit risk assessment, benefiting both lenders and borrowers.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631.
- Approaches for credit scorecard calibration: An empirical analysis. Knowledge-Based Systems, 134:213–227.
- Deep learning of structural changes in historical buildings: The case study of the pisa tower. In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - NCTA, pages 396–403. INSTICC, SciTePress.
- An interval-valued approach to business process simulation based on genetic algorithms and the bpmn. Information, 5(2):319–356.
- Stock price forecasting over adaptive timescale using supervised learning and receptive fields. In Groza, A. and Prasath, R., editors, Mining Intelligence and Knowledge Exploration, pages 279–288, Cham. Springer International Publishing.
- Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91:106263.
- A survey on machine learning and statistical techniques in bankruptcy prediction. International Journal of Machine Learning and Computing, 8(2):133–139.
- Liquidity constraints, home equity and residential mortgage losses. The Journal of Real Estate Finance and Economics, 61:208–246.
- Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the roc curve. Machine learning, 77(1):103–123.
- A survey of evolutionary algorithms for multi-objective optimization problems with irregular pareto fronts. IEEE/CAA Journal of Automatica Sinica, 8(2):303–318.
- Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Applied Energy, 221:386–405.
- A deep learning approach for credit scoring using credit default swaps. Engineering Applications of Artificial Intelligence, 65:465–470.
- Machine learning with big data: Challenges and approaches. Ieee Access, 5:7776–7797.
- Credit default prediction modeling: an application of support vector machine. Risk Management, 19:158–187.
- Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33:15288–15299.
- Navas-Palencia, G. (2020a). Github optbinning repository, https://github.com/guillermo-navas-palencia/optbinning.
- Navas-Palencia, G. (2020b). Optimal binning: mathematical programming formulation. abs/2001.08025.
- The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing, 74:26–39.
- Effects of environmental conditions on historic buildings: Interpretable versus accurate exploratory data analysis. In Proceedings of the 12th International Conference on Data Science, Technology and Applications - DATA, pages 429–435. INSTICC, SciTePress.
- Convolutional neural networks for structural damage localization on digital twins. In Fred, A., Sansone, C., Gusikhin, O., and Madani, K., editors, Deep Learning Theory and Applications, pages 78–97, Cham. Springer Nature Switzerland.
- Structural damage localization via deep learning and iot enabled digital twin. In Proceedings of the 3rd International Conference on Deep Learning Theory and Applications - DeLTA, pages 199–206. INSTICC, SciTePress.
- Image-based screening of oral cancer via deep ensemble architecture. In 2023 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1572–1578.
- Weight-of-evidence through shrinkage and spline binning for interpretable nonlinear classification. Applied Soft Computing, 115:108160.
- Statistical methods for fighting financial crimes. Technometrics, 52(1):5–19.
- Machine learning and decision support system on credit scoring. Neural Computing and Applications, 32:9809–9826.
- Credit scoring and its applications. SIAM.
- Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2):505–513.
- A comprehensive review on nsga-ii for multi-objective combinatorial optimization problems. Ieee Access, 9:57757–57791.
- Zeng, G. (2014). A necessary condition for a good binning algorithm in credit scoring. Applied Mathematical Sciences, 8(65):3229–3242.