Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Classification: Financial Reasoning in State-of-the-Art Language Models (2305.01505v2)

Published 30 Apr 2023 in cs.CL, cs.AI, and cs.CY

Abstract: LLMs, consisting of 100 billion or more parameters, have demonstrated remarkable ability in complex multi-step reasoning tasks. However, the application of such generic advancements has been limited to a few fields, such as clinical or legal, with the field of financial reasoning remaining largely unexplored. To the best of our knowledge, the ability of LLMs to solve financial reasoning problems has never been dealt with, and whether it can be performed at any scale remains unknown. To address this knowledge gap, this research presents a comprehensive investigation into the potential application of LLMs in the financial domain. The investigation includes a detailed exploration of a range of subjects, including task formulation, synthetic data generation, prompting methods, and evaluation capability. Furthermore, the study benchmarks various GPT variants with parameter scales ranging from 2.8B to 13B, with and without instruction tuning, on diverse dataset sizes. By analyzing the results, we reveal that the ability to generate coherent financial reasoning first emerges at 6B parameters, and continues to improve with better instruction-tuning or larger datasets. Additionally, the study provides a publicly accessible dataset named sFIOG (Synthetic-Financial Investment Opinion Generation), consisting of 11,802 synthetic investment thesis samples, to support further research in the field of financial reasoning. Overall, this research seeks to contribute to the understanding of the efficacy of LLMs in the field of finance, with a particular emphasis on their ability to engage in sophisticated reasoning and analysis within the context of investment decision-making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063, 2019.
  2. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023.
  3. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745, 2022.
  4. Specializing smaller language models towards multi-step reasoning. arXiv preprint arXiv:2301.12726, 2023.
  5. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056, 2023.
  6. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071, 2022.
  7. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  8. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  9. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
  10. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
  11. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634, 2023.
  12. Mtld, vocd-d, and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods, 42(2):381–392, 2010.
  13. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  14. Masanori Oya. Syntactic similarity of the sentences in a multi-lingual parallel corpus based on the euclidean distance of their dependency trees. In Proceedings of the 34th pacific Asia conference on language, information and computation, pages 225–233, 2020.
  15. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  16. When flue meets flang: Benchmarks and large pre-trained language model for financial domain. arXiv preprint arXiv:2211.00083, 2022.
  17. Large language models can be easily distracted by irrelevant context. arXiv preprint arXiv:2302.00093, 2023.
  18. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  19. Removing non-stationary knowledge from pre-trained language models for entity-level sentiment classification in finance. arXiv preprint arXiv:2301.03136, 2023.
  20. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022.
  21. Lexical statistics and tipological structures: a measure of lexical richness. Procedia-Social and Behavioral Sciences, 95:447–454, 2013.
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  23. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199, 2022.
  24. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
  25. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  26. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023.
  27. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070, 2023.
  28. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guijin Son (20 papers)
  2. Hanearl Jung (1 paper)
  3. Moonjeong Hahm (3 papers)
  4. Keonju Na (1 paper)
  5. Sol Jin (2 papers)
Citations (15)