Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CFGPT: Chinese Financial Assistant with Large Language Model (2309.10654v2)

Published 19 Sep 2023 in cs.CL, cs.AI, and cs.CE

Abstract: LLMs have demonstrated great potential in natural language processing tasks within the financial domain. In this work, we present a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT, which includes a dataset~(CFData) for pre-training and supervised fine-tuning, a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment framework~(CFAPP) designed to navigate real-world financial applications. The CFData comprising both a pre-training dataset and a supervised fine-tuning dataset, where the pre-training dataset collates Chinese financial data and analytics, alongside a smaller subset of general-purpose text with 584M documents and 141B tokens in total, and the supervised fine-tuning dataset is tailored for six distinct financial tasks, embodying various facets of financial analysis and decision-making with 1.5M instruction pairs and 1.5B tokens in total. The CFLLM, which is based on InternLM-7B to balance the model capability and size, is trained on CFData in two stage, continued pre-training and supervised fine-tuning. The CFAPP is centered on LLMs and augmented with additional modules to ensure multifaceted functionality in real-world application. Our codes are released at https://github.com/TongjiFinLab/CFGPT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiangtong Li (24 papers)
  2. Yuxuan Bian (9 papers)
  3. Guoxuan Wang (4 papers)
  4. Yang Lei (59 papers)
  5. Dawei Cheng (38 papers)
  6. Zhijun Ding (9 papers)
  7. Changjun Jiang (47 papers)
Citations (9)
Github Logo Streamline Icon: https://streamlinehq.com