Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models (2307.11991v2)

Published 22 Jul 2023 in cs.CL and cs.AI

Abstract: The demand for psychological counselling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counselling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based assistive tool leveraging LLMs for question-answering in psychological consultation settings to ease the demand for mental health professions. Our framework combines pre-trained LLMs with real-world professional Q&A from psychologists and extensively crawled psychological articles. The Psy-LLM framework serves as a front-end tool for healthcare professionals, allowing them to provide immediate responses and mindfulness activities to alleviate patient stress. Additionally, it functions as a screening tool to identify urgent cases requiring further assistance. We evaluated the framework using intrinsic metrics, such as perplexity, and extrinsic evaluation metrics, with human participant assessments of response helpfulness, fluency, relevance, and logic. The results demonstrate the effectiveness of the Psy-LLM framework in generating coherent and relevant answers to psychological questions. This article discusses the potential and limitations of using LLMs to enhance mental health support through AI technologies.

Here is a summary of the paper "Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based LLMs" (Lai et al., 2023 ).

The paper addresses the increasing global demand for psychological counseling, exacerbated by events like the COVID-19 pandemic, and the insufficient supply of mental health professionals, particularly in overpopulated regions like China. To help bridge this gap, the authors propose the Psy-LLM framework, an AI-based assistive tool leveraging LLMs for question-answering in psychological consultation settings.

The Psy-LLM framework is designed to serve a dual purpose:

  1. Assistive tool for healthcare professionals: It can provide immediate response suggestions and mindfulness activities to human counselors during online consultations, easing their workload and potentially lowering the entry barrier for newly trained staff.
  2. Standalone tool for users: During off-hours or periods of high demand when human counselors are unavailable, Psy-LLM can function as a web-based frontend allowing users to interact directly with the system for timely support and screening of urgent cases.

The framework utilizes pre-trained LLMs and is further trained on a corpus combining real-world professional Q&A from psychologists (the PsyQA dataset) and a large volume of crawled psychological articles from Chinese social media platforms (Tianya, Zhihu, Yixinli). The authors specifically investigate two large-scale Chinese pre-trained models: PanGu and WenZhong.

Implementation Details:

  • Model Selection: PanGu (variants from 350M to 200B parameters) and WenZhong (based on GPT-2, variants including 110M and 3.5B parameters) were chosen due to their pre-training on large Chinese corpora and generation capabilities. PanGu's architecture is similar to GPT-3, incorporating a unique query layer for next-token prediction.
  • Data Collection: A substantial dataset (approximately 2.85GB, 400,000 samples) was collected. This included 22,000 questions and 56,000 answers from the PsyQA dataset, reviewed by psychology professionals, and approximately 350,000 samples crawled from various online platforms over ~70 hours. A distributed crawling approach was used to handle the scale and overcome anti-crawler measures.
  • Data Cleaning: A multi-step cleaning process was applied, including removing duplicates, advertisements, short samples (<150 characters), URLs, user names, repeated punctuation, and converting traditional to simplified Chinese.
  • Model Training:
    • Initial training was performed on the 2.85GB crawled psychology corpus for domain knowledge acquisition.
    • Fine-tuning was then conducted on the 56,000 Q&A pairs from the PsyQA dataset to improve the model's ability to generate helpful, structured responses.
    • PanGu 350M was trained on the OpenI platform using a V100 GPU, with a batch size of 8 for 100,000 iterations until convergence.
    • WenZhong 110M was trained in a Jupyter Notebook environment with 64GB memory and an RTX3060 GPU, also using early stopping based on validation loss.
  • Web Interface: A cloud-based distributed architecture was developed for the online platform.
    • Technologies: ReactJS for the front-end (deployed on AWS Amplify), Flask/Python for the backend API and model runtime (on Amazon EC2 instances), Apache for reverse proxying, Google Domain for DNS, and Let's Encrypt/Certbot for HTTPS encryption.
    • Architecture: The system separates front-end, back-end, and computing servers into modular components communicating via APIs, prioritizing scalability, ease of upgrade, and security (HTTPS, TLS encryption).
    • User Interface: A simple web interface allows users to input questions, see a loading status, and receive model-generated answers, including a rating system for user feedback.

Evaluation:

Both intrinsic and human evaluation methods were used:

  • Intrinsic Evaluation: Measured using Perplexity, ROUGE-L, Distinct-1, and Distinct-2. The results consistently showed that the PanGu model (Perplexity 34.56, ROUGE-L 28.18, Distinct-1 4.57, Distinct-2 12.74) outperformed the WenZhong model (Perplexity 38.40, ROUGE-L 23.56, Distinct-1 3.55, Distinct-2 9.67) on the held-out test set, indicating better language generation quality, similarity to reference text, and diversity.
  • Human Evaluation: Six psychology students evaluated 200 Q&A pairs.
    • Method 1 (Comparing AI responses): Evaluators scored PanGu and WenZhong responses to the same question on Helpfulness, Fluency, Relevance, and Logic (scale 1-5). PanGu received higher average scores across all metrics (Helpfulness 3.87, Fluency 4.36, Relevance 4.09, Logic 3.83) compared to WenZhong (Helpfulness 3.56, Fluency 4.14, Relevance 3.87, Logic 3.63).
    • Method 2 (Comparing AI vs. Ground Truth): Predicted answers from both models were evaluated alongside actual answers from the dataset. While PanGu still scored higher than WenZhong, the scores for the actual (human-written) answers were significantly higher across all metrics (Helpfulness 4.52, Fluency 4.83, Relevance 4.72, Logic 4.56). This highlighted the remaining gap between current LLM performance and professional human psychological consultation.

Discussion, Limitations, and Future Work:

The authors acknowledge several limitations and areas for future improvement:

  • Model Capability & Real-World Usage: Psy-LLM, as a text-based model, lacks the ability to process nonverbal cues, which are crucial in counseling. Building rapport is also a challenge. The system is best viewed as an assistive tool for human counselors rather than a replacement. Integrating computer vision (e.g., facial emotion detection) is suggested for future unified systems.
  • Data Collection: Challenges included anti-crawler mechanisms and the difficulty of standardizing and thoroughly cleaning the large volume of crawled data. More robust crawling and advanced NLP-based cleaning are needed.
  • Model Improvement: Both PanGu and WenZhong, being autoregressive, suffer from unidirectional context limitations and potential exposure bias during training. The quality of the dataset and limited computing resources for training/tuning were identified as hindrances. Exploring alternative architectures (bidirectional, knowledge-infused) and better tokenization methods for Chinese could help.
  • User Experience and Interface: Future work should focus on enhancing UI simplicity, responsiveness, personalization features, and incorporating user feedback loops.
  • Ethical Considerations and User Privacy: Robust privacy mechanisms (consent, anonymization, access controls), handling sensitive queries safely, avoiding harmful advice, integrating reporting systems, and monitoring for biases are critical for practical deployment.

Conclusion:

Despite limitations, the project successfully demonstrates the feasibility of using fine-tuned LLMs like PanGu and WenZhong to create an AI-based assistive tool for mental health Q&A. The implemented web platform provides an accessible prototype, showing potential for streamlining mental health support. The results offer valuable insights for future research at the intersection of NLP and psychology, emphasizing the need for improved data quality, model architectures, and careful consideration of ethical and practical deployment challenges to better support global mental well-being.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tin Lai (23 papers)
  2. Yukun Shi (8 papers)
  3. Zicong Du (1 paper)
  4. Jiajie Wu (11 papers)
  5. Ken Fu (1 paper)
  6. Yichao Dou (1 paper)
  7. Ziqi Wang (92 papers)
Citations (32)