Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring (2310.15461v2)

Published 24 Oct 2023 in cs.HC and cs.CL

Abstract: Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and emotionally triggering, creating accessibility barriers that limit their wide-scale implementation and adoption. In this paper, we study how human-LLM interaction can support self-guided mental health interventions. We take cognitive restructuring, an evidence-based therapeutic technique to overcome negative thinking, as a case study. In an IRB-approved randomized field study on a large mental health website with 15,531 participants, we design and evaluate a system that uses LLMs to support people through various steps of cognitive restructuring. Our findings reveal that our system positively impacts emotional intensity for 67% of participants and helps 65% overcome negative thoughts. Although adolescents report relatively worse outcomes, we find that tailored interventions that simplify LLM generations improve overall effectiveness and equity.

Citations (23)

Summary

  • The paper demonstrates that a GPT-3 powered, five-step LM-assisted cognitive restructuring process significantly reduces negative emotion intensity (67.64% improvement).
  • A randomized field study with 15,531 participants used mixed methods to evaluate quantitative outcomes and capture qualitative insights on usability and effectiveness.
  • The intervention design, featuring contextualization, iterative reframe refinement, and robust safety filters, shows promise in enhancing engagement and addressing equity challenges.

This paper, "Facilitating Self-Guided Mental Health Interventions Through Human-LLM Interaction: A Case Study of Cognitive Restructuring" (2310.15461), explores the potential of using LLMs (LMs) to enhance self-guided mental health interventions, specifically focusing on Cognitive Restructuring. The authors address the challenges of traditional self-guided tools, which often require significant cognitive effort, can be emotionally triggering, and suffer from low engagement and high dropout rates.

The core of the research is a system designed for human-LM interaction to support individuals through the Cognitive Restructuring process. Cognitive Restructuring is an evidence-based technique involving identifying negative thoughts, recognizing thinking traps (cognitive distortions), and reframing the thoughts into more balanced perspectives. The system guides participants through five steps: describing the negative thought, detailing the situation, reflecting on the emotion, identifying potential thinking traps with LM assistance, and writing a reframed thought with LM-generated suggestions.

The system leverages LMs (specifically GPT-3, finetuned or used with retrieval-enhanced in-context learning) for two key tasks:

  1. Thinking Trap Identification: Given a participant's thought, the LM suggests the most likely thinking traps from a predefined list (13 common types), along with their estimated likelihoods. Definitions and examples of these traps are provided as psychoeducation.
  2. Reframe Generation: Based on the participant's thought and situation, the LM generates multiple potential reframed thoughts. Psychoeducation on reframing techniques for the identified traps is also included. Participants can select, edit, or use these suggestions as inspiration.

A key design feature is the ability for participants to iteratively refine their chosen reframe. They can request more specific suggestions from the LM categorized as "actionable," "empathic," or "personalized."

To ensure safety, the system integrates content filtering mechanisms, combining Azure OpenAI's classification-based filtering for harmful categories (hate, sexual, violence, self-harm) with a rule-based filter using regular expressions specifically targeting suicidal ideation and self-harm language. Participants also have an option to manually flag inappropriate content.

The system was evaluated through a large-scale randomized field paper involving 15,531 participants on the Mental Health America (MHA) website. The paper used a mixed-methods approach, collecting quantitative outcome measures (reduction in emotion intensity, reframe relatability, helpfulness, memorability, and skill learnability) and qualitative feedback.

Key Findings and Practical Implications:

  • Overall Effectiveness (RQ2a): The intervention showed positive results. 67.64% of participants reported a reduction in negative emotion intensity after using the system, and 65.65% found the reframes helpful in overcoming negative thoughts. Participants with higher initial emotional intensity experienced a greater reduction in emotion but found the reframing/learning process more challenging. Qualitative feedback indicated that the LM assistance helped participants overcome cognitive barriers (e.g., feeling stuck) and emotional barriers (e.g., making the process less daunting), and valued the exploration of multiple viewpoints.
  • Impact of Design Hypotheses (RQ2b):
    • Contextualization (H2): Asking participants to describe the situation related to their thought led to significantly more helpful reframes (2.80% increase) without increasing dropout rates. Asking about emotions, however, led to lower reframe relatability, potentially because the current LM setup doesn't explicitly incorporate emotion into its outputs. This highlights the need for LMs to effectively utilize all provided context for personalization.
    • Psychoeducation (H3): While qualitatively appreciated by participants, integrating psychoeducation (definitions, examples, tips) did not lead to statistically significant quantitative improvements in self-reported outcomes, including skill learnability. This suggests that while useful for understanding, it might not directly translate to perceived immediate impact on these specific metrics in a single-session interaction.
    • Interactivity (H4): Providing the option for participants to seek further LM assistance for reframe refinement led to a significant 23.73% greater reduction in emotion intensity among those who had the option available. Furthermore, participants who actively used this option reported higher reframe helpfulness and skill learnability. Explicitly seeking actionable reframes yielded the most positive outcomes across all metrics, while seeking empathic reframes also showed benefits. This suggests that empowering users to guide the LM output toward their specific needs (especially action-oriented ones) improves intervention effectiveness.
    • Safety (H5): The implemented safety mechanisms were largely effective, with only 0.65% of LM-generated suggestions being flagged as inappropriate by users. Analysis of flagged content showed instances where the LM repeated negative sentiment, highlighting a nuanced challenge in generating empathetic but non-reinforcing responses. Importantly, no content related to suicidal ideation or self-harm was flagged, suggesting the specific filters for these high-risk topics were effective.
  • Equity (RQ3): The intervention's effectiveness varied across different issues and demographics. Participants expressing thoughts related to Hopelessness, Loneliness, and Tasks/Achievement reported worse outcomes, while those with Parenting and Work-related issues reported better outcomes. Demographically, adolescents (13-17), males, and individuals with lower education levels reported worse outcomes compared to adults (>=25) and those with graduate/doctorate education. This highlights existing biases or differential effectiveness of current LMs and intervention designs across populations.
  • Improving Equity (Adolescents): Recognizing the disparity for adolescents, the authors hypothesized that linguistic complexity played a role. An RCT showed that providing simpler and more casual LM-generated reframes significantly increased reframe relatability and helpfulness for adolescents aged 13-14 and helpfulness for those aged 15-17, without impacting adults. This demonstrates a practical strategy for improving equity by tailoring LM outputs to specific user subgroups.

Implementation Considerations:

  • LM Selection and Fine-tuning: The paper used GPT-3, suggesting that powerful generative models are suitable. Fine-tuning on domain-specific data (like thinking traps and expert-generated reframes) is crucial for task accuracy and relevance. Retrieval-enhanced methods can help ground generation in expert examples.
  • System Architecture: A multi-step process guiding the user through cognitive restructuring is feasible. The interaction should balance LM suggestion (assistance) with user control (agency, editing).
  • Context Integration: Incorporating user-provided context like situation is beneficial for personalization. LMs need to be capable of effectively utilizing this context during generation. For emotional context, LMs or methods that can explicitly address and validate emotions might be required.
  • Interactivity Design: Allowing users to request specific types of reframe modifications (e.g., actionable) is a promising approach to increase user engagement and perceived helpfulness. Designing clear options for iterative refinement is important.
  • Safety Implementation: Robust content filtering, especially for high-risk mental health topics, is paramount. A multi-layered approach (classification + rule-based) is recommended. User flagging mechanisms provide valuable feedback for system improvement. Ethical principles (non-maleficence, beneficence, autonomy, justice, explicability) should guide design and deployment in mental health contexts.
  • Addressing Equity: Implement strategies to assess and mitigate disparate outcomes across user subgroups. Tailoring LM output style (e.g., language complexity) based on demographics like age can significantly improve effectiveness for specific populations. Future work could explore tailoring based on issue type or other demographic factors.
  • Computational Requirements: Using large LMs like GPT-3 requires access to computational resources and APIs. The computational cost of generating multiple suggestions and performing iterative refinements should be considered for deployment at scale.
  • Deployment Platform: Deploying on existing mental health platforms (like MHA) allows access to a large, ecologically valid user base actively seeking mental health support, providing valuable real-world evaluation data.

Overall, the paper provides strong empirical evidence for the utility of human-LM interaction in supporting self-guided cognitive restructuring. It offers practical design insights, evaluates the impact of specific features through rigorous trials, and highlights the critical importance of addressing equity and safety when applying LMs in mental health.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets