MentaLLaMA: Interpretable Mental Health LLM
- MentaLLaMA is an open-source large language model series tailored for mental health analysis that uses domain-specific instruction tuning to generate interpretable rationales.
- It builds on LLaMA-2 architectures and is fine-tuned with the IMHI dataset, combining social media data to predict conditions like depression and suicide risk.
- The model demonstrates strong performance in risk summarization and clinical NLP, achieving high recall and precision in evidence extraction for suicide risk assessment.
MentaLLaMA is an open-source LLM series tailored for interpretable mental health analysis on social media and other short-form text corpora. Building directly on LLaMA-2 architectures, it incorporates instruction tuning with expert-validated explanations, task labels, and psychological best practices, with the aim of generating both accurate predictions (e.g., depression, stress, suicide risk) and human-interpretable rationales. MentaLLaMA has been widely adopted for evidence extraction, risk summarization, and pragmatic reasoning in clinical NLP and forms the basis of downstream systems for suicide risk assessment and emotional support chatbots (Yang et al., 2023, Tanaka et al., 2024, Oram et al., 31 Jul 2025).
1. Model Architecture and Variants
MentaLLaMA is based on LLaMA-2, a Transformer decoder model with standard architectural hyperparameters for the 7B and 13B scales: 32–40 transformer layers, hidden sizes of 4,096–5,120, and approximately 32 attention heads. No architectural modifications are made; all improvements derive from domain- and task-specific instruction tuning (Yang et al., 2023, Tanaka et al., 2024). The main variants are:
| Model Variant | Base | Size | Instruction Tuning |
|---|---|---|---|
| MentaLLaMA-7B | LLaMA2 | 7B | IMHI dataset |
| MentaLLaMA-chat-7B | LLaMA2-chat | 7B | IMHI dataset |
| MentaLLaMA-chat-13B | LLaMA2-chat | 13B | IMHI dataset |
The pretraining corpus includes web text (e.g., CommonCrawl, The Pile) augmented with dialogue and code. The core pretraining objective is autoregressive maximum likelihood:
Instruction tuning on the interpretable mental health instruction (IMHI) dataset employs conditional text generation with cross-entropy objectives (Yang et al., 2023).
2. IMHI Dataset: Multi-Task, Multi-Source Foundation
The IMHI dataset underpins MentaLLaMA’s instruction tuning. It combines 10 existing corpora spanning 8 mental health analysis tasks (e.g., depression detection, stress, suicide risk, wellness dimensions, risk factors) on Reddit, Twitter, and SMS data. Altogether, the dataset comprises ∼105,000 (post, label, expert-prompt, explanation) instances. Each is generated using few-shot ChatGPT prompting grounded in expert-written templates. Labels and explanations cover a spectrum of disorders, causes, and risk factors (Yang et al., 2023).
Instructions and sample explanations were validated for correctness (agreement with gold labels), consistency (task-specific classifier accuracy), and quality (BART-score vs. gold rationales), with median expert evaluations ≥2.5/3 for consistency and ≥2.0/3 for reliability, professionality, and overall (Yang et al., 2023).
3. Instruction-Tuning and Optimization
MentaLLaMA is fine-tuned on IMHI via a conditional generation framework. The training objective minimizes
where is the prompt (task+post), the rationale+answer, and the regularization coefficient. Optimization uses AdamW, batch size 256, max sequence length 2048, and linear warmup. Hardware includes 4×A100 GPUs with Flash-Attention (Yang et al., 2023). No auxiliary classification heads, ranking losses, or other architectural changes are introduced.
4. Evaluation: Correctness and Explanation Quality
MentaLLaMA and its chat variants are evaluated on a 10-dataset IMHI-held-out benchmark comprising stress, depression, suicide risk, loneliness, stress causes, and risk factors. Evaluation metrics are:
- Classification correctness: weighted (labels extracted from rationales via MentalBERT for non-templatic outputs).
- Explanation quality: BART-score and human expert ratings.
Comparative results demonstrate that MentaLLaMA-chat-13B matches or is within 5 points of discriminative SOTA (MentalRoBERTa) on 7/10 tasks, and consistently outperforms T5/BART/ChatGPT generative baselines in explanation quality by 0.2–0.5 BART-score (Yang et al., 2023). Human scoring reflects high fluency and coherence, though a residual gap in “professionality” versus ChatGPT remains.
5. Empirical Performance Across Downstream Applications
MentaLLaMA serves as the generative engine in high-stakes clinical NLP pipelines:
- In suicide risk evidence summarization, MentaLLaMA is combined with BERT-based risk extraction and phrase dictionaries. The integrated system achieves recall and precision for highlight extraction, ranking 1st for recall in CLPsych 2024 shared task (Tanaka et al., 2024).
- In pragmatic reasoning benchmarks (P-ReMe), MentaLLaMA-7B underperforms more generalist instruction-tuned 7B LLMs (Mistral, Qwen) in agreement detection, implicature, and presupposition NLI, with maximum accuracy 0.52 (vs. Mistral/Qwen ≥0.90) and limited benefit from chain-of-thought prompting. This is attributed to possible overspecialization in surface empathy and data distribution mismatch (Oram et al., 31 Jul 2025).
6. Extensions, Critiques, and Best Practices
Domain-specialized instruction tuning enables state-of-the-art explainability, but limitations persist:
- MentaLLaMA’s professionality lags models trained with explicit clinical corpora or additional retrieval (e.g., PHQ-9, psychiatry notes). Automatic metrics (e.g., BART-score) only moderately correlate with human judgment. Broader instruction-tuning curricula may improve pragmatic inference (Oram et al., 31 Jul 2025).
- The architecture does not currently incorporate multi-modal context, longitudinal user modeling, or external retrieval. Future directions include continual pretraining on clinical guidelines and expansion to multi-modal and longitudinal signals (Yang et al., 2023).
- Ethical guidelines stress that MentaLLaMA-derived assistants (e.g., Sólo Escúchame for Spanish emotional support) are only supplemental and not substitutes for licensed professionals (Ramírez et al., 2024).
7. Open Resources and Community Impact
All MentaLLaMA code, checkpoint weights, and the IMHI dataset are publicly released for transparency and reproducibility. This positions MentaLLaMA as a research platform for computational psychologists, early-warning public health tools, and large-scale mental health discourse monitoring—always with clear human-readable rationales (Yang et al., 2023).
References
- [MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models, (Yang et al., 2023)]
- [Integrating Supervised Extractive and Generative Language Models for Suicide Risk Evidence Summarization, (Tanaka et al., 2024)]
- [P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication, (Oram et al., 31 Jul 2025)]
- [Sólo Escúchame: Spanish Emotional Accompaniment Chatbot, (Ramírez et al., 2024)]