AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations (2510.14937v1)

Published 16 Oct 2025 in cs.CL

Abstract: Mental health disorders remain among the leading cause of disability worldwide, yet conditions such as depression, anxiety, and Post-Traumatic Stress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed due to subjective assessments, limited clinical resources, and stigma and low awareness. In primary care settings, studies show that providers misidentify depression or anxiety in over 60% of cases, highlighting the urgent need for scalable, accessible, and context-aware diagnostic tools that can support early detection and intervention. In this study, we evaluate the effectiveness of machine learning models for mental health screening using a unique dataset of 553 real-world, semistructured interviews, each paried with ground-truth diagnoses for major depressive episodes (MDE), anxiety disorders, and PTSD. We benchmark multiple model classes, including zero-shot prompting with GPT-4.1 Mini and MetaLLaMA, as well as fine-tuned RoBERTa models using LowRank Adaptation (LoRA). Our models achieve over 80% accuracy across diagnostic categories, with especially strongperformance on PTSD (up to 89% accuracy and 98% recall). We also find that using shorter context, focused context segments improves recall, suggesting that focused narrative cues enhance detection sensitivity. LoRA fine-tuning proves both efficient and effective, with lower-rank configurations (e.g., rank 8 and 16) maintaining competitive performance across evaluation metrics. Our results demonstrate that LLM-based models can offer substantial improvements over traditional self-report screening tools, providing a path toward low-barrier, AI-powerd early diagnosis. This work lays the groundwork for integrating machine learning into real-world clinical workflows, particularly in low-resource or high-stigma environments where access to timely mental health care is most limited.

Summary

The paper demonstrates a novel approach by using multi-label classification on 553 clinical interviews to detect MDE, anxiety, and PTSD.
It compares zero-shot LLMs like GPT-4.1 Mini and Meta-LLaMA with LoRA-adapted RoBERTa, highlighting trade-offs between recall and precision.
The findings indicate that AI methods can offer scalable and low-barrier diagnostics, especially in resource-constrained clinical settings.

"AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations" (2510.14937)

Introduction

The paper "AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations" investigates the application of machine learning and natural language processing techniques to diagnose mental health disorders from clinical interviews. This paper emphasizes the necessity for scalable and accessible diagnostic tools by evaluating a dataset of semi-structured interviews paired with clinical diagnoses for major depressive episodes (MDE), anxiety disorders, and PTSD.

Dataset and Methodology

The dataset comprises 553 real-world, semi-structured psychiatric interviews. Each interview is paired with a ground-truth clinical diagnosis for MDE, anxiety disorders, and PTSD. The paper frames the problem as a multi-label text classification task. Various AI models were evaluated for this task, including zero-shot prompting models like GPT-4.1 Mini and Meta-LLaMA, as well as a fine-tuned RoBERTa model using Low-Rank Adaptation (LoRA).

Figure 1: Overview of dataset, LLM adaptation methods, and prediction targets.

Model Evaluation

The paper benchmarks multiple model classes for their diagnostic effectiveness:

Zero-shot LLMs:
- GPT-4.1 Mini and Meta-LLaMA-3-8B: These models are evaluated using zero-shot prompting. They require no task-specific tuning and use direct interview transcripts, achieving high recall metrics but lower precision, indicating a tendency for false positives.
Fine-tuning with Low-Rank Adaptation (LoRA):
- RoBERTa models fine-tuned with LoRA showed competitive performance across evaluation metrics. Using low-rank configurations (e.g., rank 8 and 16), they maintained efficiency with substantial improvements in both accuracy and recall.

Decoder-based LLMs exhibited significant improvements over traditional self-report screening tools, including high accuracy rates (over 80%) across diagnostic categories, reaching remarkable performance on PTSD with up to 89% accuracy and 98% recall.

Analysis and Implications

The use of shorter, focused context segments was found to improve recall, which aligns well with the theory that narrative cues enhance detection sensitivity. This insight can guide the development of more context-aware models. LLM-based methods present lower barriers compared to traditional approaches as they require no clinical infrastructure, making them ideal for deployment in low-resource settings.

Challenges

The paper highlights several challenges, including:

Data Imbalance: There is a skew in diagnostic labels, affecting sensitivity and reliability. Strategies like reweighting and oversampling are suggested to counteract these imbalances.
Domain-specific Adaptation: While LLMs handled natural language well, their understanding of specific psychiatric contexts lagged, which could be mitigated with domain adaptation or using models pretrained on psychiatric corpora.

Conclusion

This work demonstrates the potential of machine learning, particularly LLMs, in transforming mental health diagnosis by integrating into real-world clinical settings. The models are shown to significantly outperform traditional methods, laying a solid foundation for leveraging AI in early mental health disorder detection. Further research could focus on enhancing models' domain understanding and addressing the challenges of label imbalance and narrative context processing. The findings underscore the promise of AI in offering low-barrier, scalable mental health diagnostics, especially beneficial in high-stigma or resource-constrained environments.