From Conversation to Automation: Leveraging LLMs for Problem-Solving Therapy Analysis

Published 10 Jan 2025 in cs.CL | (2501.06101v2)

Abstract: Problem-solving therapy (PST) is a structured psychological approach that helps individuals manage stress and resolve personal issues by guiding them through problem identification, solution brainstorming, decision-making, and outcome evaluation. As mental health care increasingly adopts technologies like chatbots and LLMs, it is important to thoroughly understand how each session of PST is conducted before attempting to automate it. We developed a comprehensive framework for PST annotation using established PST Core Strategies and a set of novel Facilitative Strategies to analyze a corpus of real-world therapy transcripts to determine which strategies are most prevalent. Using various LLMs and transformer-based models, we found that GPT-4o outperformed all models, achieving the highest accuracy (0.76) in identifying all strategies. To gain deeper insights, we examined how strategies are applied by analyzing Therapeutic Dynamics (autonomy, self-disclosure, and metaphor), and linguistic patterns within our labeled data. Our research highlights LLMs' potential to automate therapy dialogue analysis, offering a scalable tool for mental health interventions. Our framework enhances PST by improving accessibility, effectiveness, and personalized support for therapists.

Abstract PDF Upgrade to Chat

Summary

The paper presents a framework and methodology for automating the analysis of Problem-Solving Therapy (PST) sessions using large language models and fine-tuned transformer models applied to a large dataset of annotated therapy transcripts.
Zero-shot evaluation shows proprietary large language models, such as GPT-4o, achieve strong performance with a weighted F1 score of 0.76 for strategy annotation, exhibiting consistent results despite reduced performance when conversational context was provided.
Fine-tuned transformer models like MentalBERT and DeBERTa also demonstrate competitive performance (combined F1 ~0.68), highlighting their potential as accessible alternatives for scalable and efficient automated analysis to support real-time clinical applications in mental health.

The paper presents a comprehensive examination of automating the analysis of Problem-Solving Therapy (PST) by leveraging both Large LLMs and transformer-based architectures. The study is grounded in the detailed annotation of anonymized therapy transcripts—240 PST sessions yielding 68,306 dialogue exchanges, from which 14,417 therapist utterances were extracted for analysis. A dual-dimensional coding framework is introduced that comprises traditional PST strategies (termed the “PS core” dimension) and an augmented set of communication strategies, thereby enriching the characterization of therapist-client interactions.

The methodology is two-fold. First, the paper evaluates the performance of several LLMs—including two proprietary models (GPT-4 and GPT-4o) and two open-source models (Llama-3.1 and Yi-1.5)—using a zero-shot prompting paradigm with and without additional conversational context. Notably, the model referred to as GPT-4o achieved a weighted F1 score of 0.76 in the “no context” condition, outperforming its counterparts (with conversely reduced performance when context was provided; for example, GPT-4o’s F1 dropped to 0.61 when supplementary dialogue was included). Entropy calculations, defined as

$u_i = -\sum_{j=1}^{k} P_{\theta}(a_{ij}\mid p_{ij})\ln P_{\theta}(a_{ij}\mid p_{ij})$

where $P_{\theta}(a_{ij}\mid p_{ij})$ denotes the probability of a given prediction—revealed that GPT-4o provided reliable and consistent annotations, with a low mean entropy (0.035) despite exhibiting slightly higher variability than some open-source models.

Second, the study explores fine-tuning three transformer-based models—DeBERTa, MentalBERT, and FLAN-T5—on a subset of 5,000 LLM-annotated therapist utterances with evaluation conducted on 500 human-annotated examples. Here, MentalBERT achieved an F1 score of 0.78 for the PS core strategies, while DeBERTa performed best on communication strategies (F1 = 0.73). Overall, these fine-tuned models reached combined F1 scores around 0.68, indicating that while proprietary LLMs provide strong performance in zero-shot settings, fine-tuning domain-adapted models can yield comparable and potentially more accessible alternatives for sensitive healthcare applications.

Additional contributions include:

Annotation Framework and Strategy Codebook:

The paper details a coding scheme based on the ADAPT model of PST that delineates five core steps (ranging from establishing a positive mindset to trying out solution plans), while also introducing new categories to capture nuances of interpersonal communication, such as session management and therapeutic engagement.

Linguistic Feature Analysis:

Using the Linguistic Inquiry and Word Count (LIWC) tool, the paper analyses the lexical and psycholinguistic characteristics of each therapeutic strategy. For instance, “Defining Problems and Goals” is most prevalent, supported by LIWC features such as “reward,” while “Generating Alternative Solutions” is associated with indicators of insight and curiosity. Bigrams extracted from the data further clarify the linguistic markers that differentiate each strategy.

Progression Across Therapy Sessions:

Analysis of the distribution of strategies across multiple therapy visits reveals a logical evolution in therapist behavior. Early sessions focus on establishing mindset and problem definition; subsequent sessions emphasize the exploration of alternatives and actionable planning; and later phases see an increased concentration on implementing and testing solutions.

Practical Significance and Limitations:

The paper discusses the potential of integrating automated PST dialogue analysis into real-time therapeutic support systems, which may enhance clinical documentation and decision-making. However, limitations are noted regarding the focus on text-only analysis, potential biases in LLM training data, and the restriction to English-language transcripts, suggesting careful consideration when extending these methods to diverse cultural and clinical contexts.

Overall, the study demonstrates that leveraging LLMs can offer scalable and efficient annotation of therapeutic dialogues, ultimately supporting more precise, data-driven mental health interventions while also highlighting the need for further improvements (particularly in the domain of interpersonal communication) within open-source models and fine-tuning strategies.

Markdown