THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report (2406.07505v1)

Published 11 Jun 2024 in cs.CL

Abstract: Recent advancements in LLMs have revealed new capabilities and opportunities across the technological landscape. However, the practicality of very large LLMs is challenged by their high compute cost, which does not justify the benefits given their limited capability compared to humans. While smaller, more practical LLMs have shown potential in financial analysis, though they are not yet fully proficient, as evidenced by their near-passing performance on the Chartered Financial Analyst (CFA) exam. In this work, we present Financial Analyst Extension to our Text Hyperlocally Augmented Large Language Extension (THaLLE), a series of 8B LLMs consistently achieving highest performance on mock CFA exams against models of comparable size. We thoroughly document the fine-tuning techniques used to facilitate future research. Additionally, we introduce the use of Flare CFA, a publicly available dataset for evaluating LLMs as a financial advisor.

PDF HTML Abstract

Evaluating LLM Fine-Tuning for Financial Proficiency: An Examination of THaLLE

The paper "THaLLE: Text Hyperlocally Augmented Large Language Extension" presents an evaluation of specialized LLMs for their proficiency in financial analysis, with a particular focus on performance metrics associated with the Chartered Financial Analyst (CFA) exam. The research is significant because it explores the adaptation of LLMs to function in highly specialized domains through fine-tuning techniques, using a new dataset dubbed Flare CFA. This work not only contends with the practicality challenge posed by the compute-intensive nature of large models but also offers specific fine-tuning strategies to render these models more cost-effective and domain-specific.

Overview of Research Approach

The paper primarily investigates two methodologies for enhancing LLMs' competency in financial analysis: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). The models were initially vetted using preliminary evaluations on mock CFA exams, incorporating internal datasets spanning over a decade (2009-2019) and newer exam data for validation. Two specific LLM architectures were further fine-tuned: Llama3-8B Instruct and Qwen2-7B Instruct. These models, along with commercially available APIs like GPT-3.5 and GPT-4o, were benchmarked against both internal and external datasets.

Experimental Findings

The THaLLE model, particularly with Qwen2-7B's instruction following, demonstrated superior performance relative to its commercial counterparts under the guidance of the Flare CFA test data. The results underscored the possibility of smaller open-source models surpassing commercial alternatives in specialized tasks when appropriately fine-tuned. Notably, THaLLE models fine-tuned with DPO displayed less susceptibility to overfitting than those relying on SFT. Furthermore, prompt loss masking emerged as a beneficial technique during SFT, while Chain-of-Thought prompting provided a consistent advantage for both SFT and DPO configurations.

Despite these advancements, the experiments highlighted the need for careful calibration of hyperparameters to optimize the effectiveness of fine-tuning techniques like DPO. Moreover, the paper draws attention to the nuanced behavior of models such as Llama3 and Qwen2 in adopting or resisting reasoning steps contingent on prompt structures, thereby affecting their overall learning curve.

Implications and Future Directions

The implications of this research extend to both theoretical and practical realms. Theoretically, the paper contributes to a deeper understanding of how LLMs can be adapted and optimized for specialized knowledge areas like finance, which demand precision and domain-specific comprehension. Practically, it provides a roadmap for cost-effective deployment of open-source LLMs in finance, holding potential for reducing dependency on proprietary systems without sacrificing performance.

Future research as recommended in the paper could focus on different domains, including linguistic capabilities beyond English, with a specific interest in developing Thai language proficiency. There is also potential to explore novel data augmentation techniques and weight-merging methods to efficiently incorporate various domain-specific skills into a single model. Finally, real-world assessments would be crucial to validate the theoretical findings of the CFA exam proxy and explore the feasibility of LLMs functioning as financial advisors in dynamic environments.

In summary, this work explores promising avenues for enhancing and optimizing LLMs in specialized domains like financial analysis. By leveraging fine-tuning techniques and robust evaluation frameworks, the paper opens new opportunities for the deployment of open-source models in cost-sensitive environments while also suggesting broader applications across multiple subject areas and languages.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

KBTG Labs (2 papers)
Danupat Khamnuansin (3 papers)
Atthakorn Petchsod (2 papers)
Anuruth Lertpiya (2 papers)
Pornchanan Balee (2 papers)
Thanawat Lodkaew (3 papers)
Tawunrat Chalothorn (5 papers)
Thadpong Pongthawornkamol (2 papers)
Monchai Lertsutthiwong (3 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos