Towards reducing hallucination in extracting information from financial reports using Large Language Models (2310.10760v1)
Abstract: For a financial analyst, the question and answer (Q&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of LLMs to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q&A systems, and empirically demonstrate superiority of our method.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
- A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000. IEEE, 39–48.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning. PMLR, 2397–2430.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- Graham A Cutting and Anne-Françoise Cutting-Decelle. 2021. Intelligent Document Processing–Methods and Tools in the real world. arXiv preprint arXiv:2112.14070 (2021).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Jade Goldstein and Jaime G Carbonell. 1998. Summarization:(1) using MMR for diversity-based reranking and (2) evaluating summaries. In TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998. 181–195.
- Hien Thi Ha and Ales Horák. 2022. Information extraction from scanned invoice images using text analysis and layout features. Signal Processing: Image Communication 102 (2022), 116601.
- Matthew A Jaro. 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Amer. Statist. Assoc. 84, 406 (1989), 414–420.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit. arXiv preprint arXiv:2306.05212 (2023).
- Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279 (2019).
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786 (2022).
- Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool. arXiv preprint arXiv:2306.15518 (2023).
- Shreeshiv Patel and Dvijesh Bhatt. 2020. Abstractive information extraction from scanned invoices (AIESI) using end-to-end sequential approach. arXiv preprint arXiv:2009.05728 (2020).
- Graphie: A graph-based framework for information extraction. arXiv preprint arXiv:1810.13083 (2018).
- Mahmudul Sheikh and Sumali Conlon. 2012. A rule-based system to extract financial information. Journal of Computer Information Systems 52, 4 (2012), 10–19.
- Docile benchmark for document information localization and extraction. arXiv preprint arXiv:2302.05658 (2023).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- William E Winkler. 1990. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. (1990).
- BARTScore: Evaluating Generated Text as Text Generation. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 27263–27277. https://proceedings.neurips.cc/paper/2021/file/e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Paper.pdf
- Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset. arXiv preprint arXiv:2305.16344 (2023).
- BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr
- ToolQA: A Dataset for LLM Question Answering with External Tools. arXiv preprint arXiv:2306.13304 (2023).