Multimodal Document Analytics for Banking Process Automation (2307.11845v2)
Abstract: Traditional banks face increasing competition from FinTechs in the rapidly evolving financial ecosystem. Raising operational efficiency is vital to address this challenge. Our study aims to improve the efficiency of document-intensive business processes in banking. To that end, we first review the landscape of business documents in the retail segment. Banking documents often contain text, layout, and visuals, suggesting that document analytics and process automation require more than plain NLP. To verify this and assess the incremental value of visual cues when processing business documents, we compare a recently proposed multimodal model called LayoutXLM to powerful text classifiers (e.g., BERT) and LLMs (e.g., GPT) in a case study related to processing company register extracts. The results confirm that incorporating layout information in a model substantially increases its performance. Interestingly, we also observed that more than 75% of the best model performance (in terms of the F1 score) can be achieved with as little as 30% of the training data. This shows that the demand for data labeled data to set up a multi-modal model can be moderate, which simplifies real-world applications of multimodal document analytics. Our study also sheds light on more specific practices in the scope of calibrating a multimodal banking document classifier, including the need for fine-tuning. In sum, the paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business and offers practical guidance on how to unlock this potential in day-to-day operations.
- A comparison of content analysis usage and text mining in csr corporate disclosure. The International Journal of Digital Accounting Research 17, 1–32. doi:10.4192/1577-8517-v17_1.
- Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access 9, 72894–72936. doi:10.1109/ACCESS.2021.3072900.
- Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Machine Vision and Applications 32. doi:10.1007/s00138-021-01249-8.
- Language models are few-shot learners. arXiv:2005.14165.
- Introduction to Banking. Pearson.
- Customers sentiment on banks. International Journal of Computer Applications 98, 8–13. doi:10.5120/17242-7578.
- Next cashtag prediction on social trading platforms with auxiliary tasks, in: 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 525–527. doi:10.1145/3341161.3342945.
- Nlp in fintech applications: Past, present and future. arXiv:2005.01320.
- XDoc: Unified pre-training for cross-format document understanding, in: Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates. pp. 1006–1016. URL: https://aclanthology.org/2022.findings-emnlp.71.
- InfoXLM: An information-theoretic framework for cross-lingual language model pre-training, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online. pp. 3576–3588. doi:10.18653/v1/2021.naacl-main.280.
- Unsupervised cross-lingual representation learning at scale, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 8440–8451. doi:10.18653/v1/2020.acl-main.747.
- Cross-lingual language model pretraining, in: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 7059–7069. URL: https://proceedings.neurips.cc/paper_files/paper/2019/file/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf.
- Deep learning for detecting financial statement fraud. Decision Support Systems 139, 113421. doi:10.1016/j.dss.2020.113421.
- Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access PP, 1–1. doi:10.1109/ACCESS.2018.2868970.
- Incorporating textual information in customer churn prediction models based on a convolutional neural network. International Journal of Forecasting 36, 1563–1578. doi:10.1016/j.ijforecast.2019.03.029.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805. arXiv:1810.04805.
- Esg scoring system construction: Portfolio investment based on machine learning. Advances in Economics, Management and Political Sciences 3, 517–525. doi:10.54254/2754-1169/3/2022829.
- Operational research and artificial intelligence methods in banking. European Journal of Operational Research 306, 1–16. doi:10.1016/j.ejor.2022.04.027.
- Multimodal deep neural networks for banking document classification.
- A natural language processing approach for financial fraud detection, in: Demetrescu, C., Mei, A. (Eds.), Proceedings of the Italian Conference on Cybersecurity ITASEC 2022, CEUR-WS.org, Rome, Italy. pp. 135–149. URL: http://ceur-ws.org/Vol-3260/paper10.pdf.
- Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems in Accounting, Finance and Management 23, 157–214. doi:10.1002/isaf.1386.
- Company2Vec - German company embeddings based on corporate websites. International Journal of Information Technology & Decision Making doi:10.1142/S0219622023500694.
- Comprehensive review of text-mining applications in finance. Journal of Financial Innovation 6. doi:10.1186/s40854-020-00205-1.
- LayoutLMv3: Pre-training for document ai with unified text and image masking. arXiv:2204.08387.
- Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research 266, 511–529. doi:10.1007/s10479-017-2668-z.
- A text mining system for deviation detection in financial documents. Intelligent Data Analysis 19, 19–44. doi:10.3233/IDA-150768.
- Deep learning enhancing banking services: a hybrid transaction classification and cash flow prediction approach. Journal of Big Data 9, 100. doi:10.1186/s40537-022-00651-x.
- A survey of the applications of text mining in financial domain. Knowledge-Based Systems 114, 128–147. doi:10.1016/j.knosys.2016.10.003.
- Multimodal deep learning for finance: Integrating and forecasting international stock markets. arXiv:1903.06478.
- Utilizing the omnipresent: Incorporating digital documents into predictive process monitoring using deep neural networks. Decision Support Systems , 114043doi:10.1016/j.dss.2023.114043.
- Fad or future? automated analysis of financial text and its implications for corporate reporting. Accounting and Business Research 49, 587 – 615. doi:10.1080/00014788.2019.1611730.
- Textual analysis of corporate disclosures: A survey of the literature. Journal of Accounting Literature 29, 143–167.
- MarkupLM: Pre-training of text and markup language for visually rich document understanding, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland. pp. 6078–6087. doi:10.18653/v1/2022.acl-long.420.
- Dit: Self-supervised pre-training for document image transformer. arXiv:2203.02378.
- Selfdoc: Self-supervised document representation learning, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5648–5656. doi:10.1109/CVPR46437.2021.00560.
- A multimodal event-driven lstm model for stock prediction using online news. IEEE Transactions on Knowledge and Data Engineering 33, 3323–3337. doi:10.1109/TKDE.2020.2968894.
- Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research 274, 743–758. doi:10.1016/j.ejor.2018.10.024.
- Efficient estimation of word representations in vector space. arXiv:1301.3781.
- Emerging-market consumers’ interactions with banking chatbots. Telematics and Informatics 65, 101711. doi:10.1016/j.tele.2021.101711.
- When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research 56, 960–980. doi:10.1177/0022243719852959.
- OpenAI, 2023. Gpt-4 technical report. arXiv:2303.08774.
- Extracting complex relations from banking documents, in: Proceedings of the Second Workshop on Economics and Natural Language Processing, Association for Computational Linguistics, Hong Kong. pp. 1–9. doi:10.18653/v1/D19-5101.
- Information extraction from text intensive and visually rich banking documents. Information Processing & Management 57, 102361. doi:10.1016/j.ipm.2020.102361.
- Fusion of visual representations for multimodal information extraction from unstructured transactional documents. Int. J. Doc. Anal. Recognit. 25, 187–205. doi:10.1007/s10032-022-00399-3.
- Text mining for big data analysis in financial sector: A literature review. Sustainability 11, 1277. doi:10.3390/su11051277.
- Investigating the factors of customer experiences using real-life text-based banking chatbot: a qualitative study in norway. Procedia Computer Science 219, 697–704. doi:10.1016/j.procs.2023.01.341.
- A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion 37, 98–125. doi:10.1016/j.inffus.2017.02.003.
- When machines trade on corporate disclosures: Using text analytics for investment strategies. Decision Support Systems 165, 113892. doi:https://doi.org/10.1016/j.dss.2022.113892.
- Building machine learning systems for automated esg scoring. The Journal of Impact and ESG Investing 1, 39–50. doi:10.3905/jesg.2021.1.010.
- The value of text for small business default prediction: A deep learning approach. European Journal of Operational Research 295, 758–771. doi:10.1016/j.ejor.2021.03.008.
- Conversation to automation in banking through chatbot using artificial machine intelligence language, in: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 611–618. doi:10.1109/ICRITO48877.2020.9197825.
- Opinion mining analysis in banking system using rough feature selection technique from social media text. International Journal of Mechanical Engineering and Technology 8, 274–289.
- A novel stock recommendation system using guba sentiment analysis. Personal and Ubiquitous Computing 22. doi:10.1007/s00779-018-1121-x.
- Multi-modal deep learning for credit rating prediction using text and numerical data streams. arXiv:2304.10740.
- Determinants of bank efficiency: The case of brazil. European Journal of Operational Research 207, 1587–1598. doi:10.1016/j.ejor.2010.06.007.
- Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained attention. Decision Support Systems 167, 113913. doi:https://doi.org/10.1016/j.dss.2022.113913.
- Natural language based financial forecasting: A survey. Artif. Intell. Rev. 50, 49–73. doi:10.1007/s10462-017-9588-9.
- LayoutLM: Pre-training of text and layout for document image understanding, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. doi:10.1145/3394486.3403172.
- Layoutxlm: Multimodal pre-training for multilingual visually-rich document understanding. arXiv:2104.08836.
- LayoutLMv2: Multi-modal pre-training for visually-rich document understanding, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online. pp. 2579–2591. doi:10.18653/v1/2021.acl-long.201.
- Fusing multiple features for depth-based action recognition. ACM Trans. Intell. Syst. Technol. 6. doi:10.1145/2629483.
- Christopher Gerling (3 papers)
- Stefan Lessmann (34 papers)