Fine Tuning Large Language Models to Deliver CBT for Depression (2412.00251v1)

Published 29 Nov 2024 in cs.AI and cs.HC

Abstract: Cognitive Behavioral Therapy (CBT) is a well-established, evidence-based treatment for Major Depressive Disorder. Unfortunately, there exist significant barriers to individuals accessing CBT, including cost, scarcity of therapists and stigma. This study explores the feasibility of fine-tuning small open weight LLMs to deliver CBT for depression. Using 58 sets of synthetic CBT transcripts generated by the Nous Research fine-tune of Llama 3.1 405b, we fine-tuned three models: Mistral 7b v0.3, Qwen 2.5 7b, and Llama 3.1 8b. CBT fidelity was evaluated through a modified Cognitive Therapy Rating Scale (CTRS). All fine-tuned models were compared against each other, as well as their instruct-tuned variants. Simulated patient transcripts were generated for the purpose of evaluating model performance, with the instruct and CBT-tuned models acting as the therapist and DeepSeek-V2.5 acting as the patient. These simulated transcripts were evaluated on a modified CTRS by Gemini 1.5 Pro-002. Our findings demonstrated that the CBT-tuned models significantly outperformed their instruct-tuned counterparts, with an average improvement of 11.33 points (p < 0.001) on total CTRS score. Llama 3.1 8b had the strongest performance (mean CTRS score 67.86 +/- 7.24), followed by Qwen 2.5 7b (64.28 +/- 9.55) and Mistral 7b v0.3 (64.17 +/- 9.79), with these differences between models being statistically significant. The CBT-tuned models were competent in implementing core CBT techniques and providing empathetic responses, however, there were limitations observed in agenda adherence, exploration depth and long-context coherence. This study establishes that CBT specific fine-tuning can effectively encode therapeutic competencies in small LLMs, though significant technical and ethical considerations must be resolved prior to clinical deployment.

Summary

The paper shows that fine-tuning LLMs leads to a significant 11.33-point improvement in CBT competency as measured by the CTRS.
The study employs QLoRA and 58 synthetic CBT transcripts to simulate full therapy sessions, ensuring realistic patient interactions.
The research highlights both the potential and limitations of LLMs in delivering CBT, emphasizing the need for further ethical and technical refinements.

Fine-Tuning LLMs for Delivering Cognitive Behavioral Therapy for Depression

The paper by Talha Tahir explores the feasibility of fine-tuning small open-weight LLMs to deliver Cognitive Behavioral Therapy (CBT) for Major Depressive Disorder (MDD). With barriers such as cost, shortage of therapists, and stigma preventing access to traditional therapy, utilizing LLMs offers a potential solution. The paper focuses on three LLMs—Mistral 7b v0.3, Qwen 2.5 7b, and Llama 3.1 8b—fine-tuned to simulate CBT therapeutic interactions. The models were evaluated through the modified Cognitive Therapy Rating Scale (CTRS), which provides a comprehensive measure of CBT competency.

Methodology

The research employed a generation of 58 sets of synthetic CBT transcripts, each simulating a complete therapy course for a unique patient profile. The profiles were developed using a variety of attributes to lend realism to interactions, while the transcripts were generated in distinct phases to adhere to the conventions of CBT. Fine-tuning was performed using the Quantized Low-Ranked Adaptation (QLoRA) technique to ensure cost-effectiveness and was done on a single NVIDIA A40 GPU.

Simulated therapy sessions were run using DeepSeek-V2.5 as the patient model, while the fine-tuned LLMs acted as therapists. The synthesized therapy transcripts were reviewed and scored for CBT fidelity using an automated pipeline leveraging Google's Gemini 1.5. Pro-002, which assessed the models on key CBT competencies including agenda setting, interpersonal effectiveness, and application of CBT techniques.

Findings

The research demonstrated that fine-tuning significantly improved the CBT abilities of the LLMs, reflected in an average improvement of 11.33 points on the total CTRS score compared to their instruct-tuned counterparts. The fine-tuned Llama 3.1 8b model achieved the highest average CTRS score of 67.86 with lesser variance among the scores, indicating both the efficacy and consistency of the fine-tuning process.

Analyses highlighted the range of competencies shown by CBT-tuned models in areas such as understanding, interpersonal effectiveness, and guided discovery. Despite these advantages, limitations were noted in aspects like agenda adherence, engagement depth, and maintaining coherence over long contexts. Moreover, the fine-tuning data revealed biases that led to premature introduction of CBT concepts, suggesting a need for data distribution refinement in subsequent studies.

Implications and Future Directions

This paper provides valuable insights into the potential for using smaller, open-weight LLMs to fulfill roles that require subtle and complex interaction capabilities, such as those needed in delivering CBT for depression. The results underscore the importance of targeted fine-tuning to imbue LLMs with specialized skills and knowledge, even within computationally constrained environments.

Future research could focus on addressing technical limitations, such as by exploring new fine-tuning strategies and generating higher-quality synthetic training data. Importantly, any application of this technology must consider ethical dimensions and patient safety, particularly when contemplating clinical deployment. Further development should prioritize robust validation through clinical trials, ensuring that AI tools are both effective and safe for real-world applications.

The paper thus not only advances academic understanding of fine-tuning LLMs for therapy-related tasks but also opens avenues for technology-driven interventions, potentially revolutionizing access to mental health care. However, meticulous effort is required to navigate the ethical and practical challenges inherent in this domain, ensuring that AI complements existing therapeutic practices sustainably and ethically.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rohanpaul_ai/status/1865346223475163239

https://twitter.com/feulf/status/1885703905310576685

YouTube

Show All Videos