Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance (2405.00566v1)

Published 1 May 2024 in cs.CE, cs.CL, and q-fin.GN

Abstract: Recently, many works have proposed various financial LLMs (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive LLM (NumLLM), for Chinese finance. We first construct a financial corpus from financial textbooks which is essential for improving numeric capability of LLMs during fine-tuning. After that, we train two individual low-rank adaptation (LoRA) modules by fine-tuning on our constructed financial corpus. One module is for adapting general-purpose LLMs to financial domain, and the other module is for enhancing the ability of NumLLM to understand financial text with numeric variables. Lastly, we merge the two LoRA modules into the foundation model to obtain NumLLM for inference. Experiments on financial question-answering benchmark show that NumLLM can boost the performance of the foundation model and can achieve the best overall performance compared to all baselines, on both numeric and non-numeric questions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  2. LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a.
  3. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b.
  4. Baichuan 2: Open large-scale language models. CoRR, abs/2309.10305, 2023.
  5. BloombergGPT: A large language model for finance. CoRR, abs/2303.17564, 2023.
  6. XuanYuan 2.0: A large chinese financial chat model with hundreds of billions parameters. In Proceedings of ACM International Conference on Information and Knowledge Management, 2023.
  7. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022.
  8. DISC-FinLLM: A chinese financial large language model based on multiple experts fine-tuning. CoRR, abs/2310.15205, 2023a.
  9. PIXIU: A comprehensive benchmark, instruction dataset and large language model for finance. In Advances in Neural Information Processing Systems, 2023a.
  10. YangMu Yu. Cornucopia-llama-fin-chinese. https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese, 2023.
  11. FinGPT: Democratizing internet-scale data for financial large language models. In Advances in Neural Information Processing Systems Workshop on Instruction Tuning and Instruction Following, 2023.
  12. Efficient and effective text encoding for chinese llama and alpaca. CoRR, abs/2304.08177, 2023.
  13. GLM: general language model pretraining with autoregressive blank infilling. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2022.
  14. Stock movement prediction from tweets and historical prices. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2018.
  15. Www’18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of ACM Web Conference, 2018.
  16. BBT-Fin: Comprehensive construction of chinese financial domain pre-trained language model, corpus and benchmark. CoRR, abs/2302.09432, 2023.
  17. TigerBot: An open multilingual multitask LLM. CoRR, abs/2312.08688, 2023b.
  18. LoRA: Low-rank adaptation of large language models. In Proceedings of International Conference on Learning Representations, 2022.
  19. Qwen technical report. CoRR, abs/2309.16609, 2023.
  20. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2020.
  21. A continual pre-training approach to tele-triaging pregnant women in kenya. In Proceedings of AAAI Conference on Artificial Intelligence, 2023a.
  22. QUERT: Continual pre-training of language model for query understanding in travel domain search. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023b.
  23. Continual pre-training of language models for math problem understanding with syntax-aware memory network. In Proceedings of Annual Meeting of the Association for Computational Linguistics, 2022.
  24. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24:240:1–240:113, 2023.
  25. LIMA: Less is more for alignment. In Advances in Neural Information Processing Systems, 2023.
  26. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. CoRR, abs/2311.16452, 2023.
  27. Look at the first sentence: Position bias in question answering. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2020.
  28. Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems, 2023.
  29. Measuring massive multitask language understanding. In Proceedings of International Conference on Learning Representations, 2021.
  30. C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. In Advances in Neural Information Processing Systems, 2023.
  31. Finetuned language models are zero-shot learners. In Proceedings of International Conference on Learning Representations, 2022.
  32. PEFT: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  33. FinEval: A chinese financial domain knowledge evaluation benchmark for large language models. CoRR, abs/2308.09975, 2023b.
  34. Fengshenbang 1.0: Being the foundation of chinese cognitive intelligence. CoRR, abs/2209.02970, 2022.
  35. An efficient svd-based method for image denoising. IEEE Transactions on Circuits and Systems for Video Technology, 26(5):868–880, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com