Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending (2401.16458v2)
Abstract: Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a LLM, a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand. Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence.
- Explainability of a Machine Learning Granting Scoring Model in Peer-to-Peer Lending. IEEE Access 8 (2020), 64873–64890. https://doi.org/10.1109/ACCESS.2020.2984412
- Risk-return modelling in the p2p lending market: Trends, gaps, recommendations and future directions. Electronic Commerce Research and Applications 49 (2021), 101079. https://doi.org/10.1016/j.elerap.2021.101079
- Prompted Opinion Summarization with GPT-3.5. arXiv preprint arXiv:2211.15914 (2022). https://doi.org/10.48550/arXiv.2211.15914
- Continual Lifelong Learning in Natural Language Processing: A Survey. In Proceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (Online), 6523–6541. https://doi.org/10.18653/v1/2020.coling-main.574
- Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
- Spanish Pre-Trained BERT Model and Evaluation Data. In Practical ML for Developing Countries Workshop at ICLR 2020. https://doi.org/10.48550/arXiv.2308.02976
- Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
- Addressing Information Asymmetries in Online Peer-to-Peer Lending. Springer International Publishing, Cham, 15–31. https://doi.org/10.1007/978-3-030-02330-0_2
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018). https://doi.org/10.48550/ARXIV.1810.04805
- Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics 8 (07 2020), 439–453. https://doi.org/10.1162/tacl_a_00325
- Description-text related soft information in peer-to-peer lending – Evidence from two leading European platforms. Journal of Banking & Finance 64 (2016), 169–187. https://doi.org/10.1016/j.jbankfin.2015.11.009
- Qiang Gao and Mingfeng Lin. 2015. Lemon or Cherry? The Value of Texts in Debt Crowdfunding. Technical Report 18. Center for Analytical Finance. University of California, Santa Cruz. https://cafin.ucsc.edu/research/work_papers/CAFIN_WP18.pdf
- Target-Dependent Sentiment Classification With BERT. IEEE Access 7 (2019), 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594
- Tell Me a Good Story and I May Lend You My Money: The Role of Narratives in Peer-to-Peer Lending Decisions. SSRN Electronic Journal (2011). https://doi.org/10.2139/ssrn.1840668
- John H. Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, Cambridge, Massachusetts, USA. https://doi.org/10.7551/mitpress/1090.001.0001
- Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research 266, 1–2 (Oct. 2017), 511–529. https://doi.org/10.1007/s10479-017-2668-z
- Johannes Kriebel and Lennart Stitz. 2022. Credit default prediction from user-generated text in peer-to-peer lending using deep learning. European Journal of Operational Research 302, 1 (Oct. 2022), 309–323. https://doi.org/10.1016/j.ejor.2021.12.024
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://doi.org/10.48550/ARXIV.1909.11942
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (09 2019), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019). https://doi.org/10.48550/arXiv.1910.13461
- Network topology and systemic risk in Peer-to-Peer lending market. Physica A: Statistical Mechanics and its Applications 508 (2018), 118–130. https://doi.org/10.1016/j.physa.2018.05.083
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692
- Tim Loughran and Bill McDonald. 2011. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66, 1 (2011), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
- CamemBERT: a Tasty French Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 7203–7219. https://doi.org/10.18653/v1/2020.acl-main.645
- Jeremy Michels. 2012. Do Unverifiable Disclosures Matter? Evidence from Peer-to-Peer Lending. The Accounting Review 87, 4 (2012), 1385–1413.
- Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/ARXIV.1301.3781
- CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering. arXiv preprint arXiv:2307.04683 (2023). https://doi.org/10.48550/arXiv.2307.04683
- Do Facial Images Matter? Understanding the Role of Private Information Disclosure in Crowdfunding Markets. Electronic Commerce Research and Applications 54, C (jul 2022), 14 pages. https://doi.org/10.1016/j.elerap.2022.101173
- Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI. https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 1, Article 140 (jan 2020), 67 pages.
- ROFIEG. 2019. Thirty recommendations on regulation, innovation and finance. Final Report to the European Commission by the Expert Group on Regulatory Obstacles to Financial Innovation. Technical Report. European Commission. https://ec.europa.eu/info/files/191113-report-expert-group-regulatory-obstacles-financial-innovation_en
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
- Michael Siering. 2023. Peer-to-Peer (P2P) Lending Risk Management: Assessing Credit Risk on Social Lending Platforms Using Textual Factors. ACM Transactions on Management Information Systems 14, 3, Article 25 (jun 2023), 19 pages. https://doi.org/10.1145/3589003
- The value of text for small business default prediction: A Deep Learning approach. European Journal of Operational Research 295, 2 (Dec. 2021), 758–771. https://doi.org/10.1016/j.ejor.2021.03.008
- How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194–206.
- Text Classification via Large Language Models. https://doi.org/10.48550/ARXIV.2305.08377
- Xu Sun and Weichao Xu. 2014. Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Signal Processing Letters 21, 11 (2014), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313
- Vijay Srinivas Tida and Sonya Hy Hsu. 2022. Universal Spam Detection using Transfer Learning of BERT Model. In Proceedings of the 55th Hawaii International Conference on System Sciences. 7669–7677. http://hdl.handle.net/10125/80263
- Attention Is All You Need. (2017). https://doi.org/10.48550/ARXIV.1706.03762
- Credit Risk Evaluation Based on Text Analysis. International Journal of Cognitive Informatics and Natural Intelligence 10 (01 2016), 1–11. https://doi.org/10.4018/IJCINI.2016010101
- Predicting loan default in peer-to-peer lending using narrative data. Journal of Forecasting 39, 2 (2020), 260–280. https://doi.org/10.1002/for.2625
- Identifying features for detecting fraudulent loan requests on P2P platforms. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI). 79–84. https://doi.org/10.1109/ISI.2016.7745447
- Peer-to-Peer Loan Fraud Detection: Constructing Features from Transaction Data. MIS Quarterly 45, 3 (Sept. 2022), 1777–1792. https://doi.org/10.25300/misq/2022/16103
- The relationship between soft information in loan titles and online peer-to-peer lending: evidence from RenRenDai platform. Electronic Commerce Research 19, 1 (2018), 111–129. https://doi.org/10.1007/s10660-018-9293-z
- Credit risk evaluation model with textual features from loan descriptions for P2P lending. Electronic Commerce Research and Applications 42 (2020), 100989. https://doi.org/10.1016/j.elerap.2020.100989