Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Large Language Model Volatility (2311.15180v1)

Published 26 Nov 2023 in q-fin.TR and cs.CL

Abstract: The impact of non-deterministic outputs from LLMs is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the LLM decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Leancontext: Cost-efficient domain-specific question answering using llms. arXiv preprint arXiv:2309.00841 (2023).
  2. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (2016), PMLR, pp. 1050–1059.
  3. What comes next? evaluating uncertainty in neural text generators against human production variability. arXiv preprint arXiv:2305.11707 (2023).
  4. News signals: An nlp library for text and time series. In 3rd Workshop for Natural Language Processing Open Source Software (2023).
  5. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. arXiv preprint arXiv:2302.09664 (2023).
  6. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017).
  7. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187 (2023).
  8. Deep probability estimation. arXiv preprint arXiv:2111.10734 (2021).
  9. Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv:2304.07619 (2023).
  10. OpenAI. Gpt-4 technical report, 2023.
  11. Quantifying social biases using templates is unreliable. arXiv preprint arXiv:2210.04337 (2022).
  12. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  13. Attention is all you need. Advances in neural information processing systems 30 (2017).
  14. Can chatgpt predict future interest rate decisions? Available at SSRN 4572831 (2023).
  15. Large language models can rate news outlet credibility. arXiv preprint arXiv:2304.00228 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Boyang Yu (22 papers)
Citations (1)