Papers
Topics
Authors
Recent
Search
2000 character limit reached

MentaLLaMA: Interpretable Mental Health LLM

Updated 27 January 2026
  • MentaLLaMA is an open-source large language model series tailored for mental health analysis that uses domain-specific instruction tuning to generate interpretable rationales.
  • It builds on LLaMA-2 architectures and is fine-tuned with the IMHI dataset, combining social media data to predict conditions like depression and suicide risk.
  • The model demonstrates strong performance in risk summarization and clinical NLP, achieving high recall and precision in evidence extraction for suicide risk assessment.

MentaLLaMA is an open-source LLM series tailored for interpretable mental health analysis on social media and other short-form text corpora. Building directly on LLaMA-2 architectures, it incorporates instruction tuning with expert-validated explanations, task labels, and psychological best practices, with the aim of generating both accurate predictions (e.g., depression, stress, suicide risk) and human-interpretable rationales. MentaLLaMA has been widely adopted for evidence extraction, risk summarization, and pragmatic reasoning in clinical NLP and forms the basis of downstream systems for suicide risk assessment and emotional support chatbots (Yang et al., 2023, Tanaka et al., 2024, Oram et al., 31 Jul 2025).

1. Model Architecture and Variants

MentaLLaMA is based on LLaMA-2, a Transformer decoder model with standard architectural hyperparameters for the 7B and 13B scales: 32–40 transformer layers, hidden sizes of 4,096–5,120, and approximately 32 attention heads. No architectural modifications are made; all improvements derive from domain- and task-specific instruction tuning (Yang et al., 2023, Tanaka et al., 2024). The main variants are:

Model Variant Base Size Instruction Tuning
MentaLLaMA-7B LLaMA2 7B IMHI dataset
MentaLLaMA-chat-7B LLaMA2-chat 7B IMHI dataset
MentaLLaMA-chat-13B LLaMA2-chat 13B IMHI dataset

The pretraining corpus includes web text (e.g., CommonCrawl, The Pile) augmented with dialogue and code. The core pretraining objective is autoregressive maximum likelihood:

L=i=1Nlogpθ(xix<i).L = -\sum_{i=1}^N \log p_\theta(x_i | x_{<i}).

Instruction tuning on the interpretable mental health instruction (IMHI) dataset employs conditional text generation with cross-entropy objectives (Yang et al., 2023).

2. IMHI Dataset: Multi-Task, Multi-Source Foundation

The IMHI dataset underpins MentaLLaMA’s instruction tuning. It combines 10 existing corpora spanning 8 mental health analysis tasks (e.g., depression detection, stress, suicide risk, wellness dimensions, risk factors) on Reddit, Twitter, and SMS data. Altogether, the dataset comprises ∼105,000 (post, label, expert-prompt, explanation) instances. Each is generated using few-shot ChatGPT prompting grounded in expert-written templates. Labels and explanations cover a spectrum of disorders, causes, and risk factors (Yang et al., 2023).

Instructions and sample explanations were validated for correctness (agreement with gold labels), consistency (task-specific classifier accuracy), and quality (BART-score vs. gold rationales), with median expert evaluations ≥2.5/3 for consistency and ≥2.0/3 for reliability, professionality, and overall (Yang et al., 2023).

3. Instruction-Tuning and Optimization

MentaLLaMA is fine-tuned on IMHI via a conditional generation framework. The training objective minimizes

L(ϕ)=(q,r)Dj=1rlogPϕ(rjq,r<j)+λϕ22\mathcal{L}(\phi) = - \sum_{(q,r)\in \mathcal{D}} \sum_{j=1}^{|r|} \log P_\phi(r_j | q, r_{<j}) + \lambda \|\phi\|_2^2

where qq is the prompt (task+post), rr the rationale+answer, and λ\lambda the regularization coefficient. Optimization uses AdamW, batch size 256, max sequence length 2048, and linear warmup. Hardware includes 4×A100 GPUs with Flash-Attention (Yang et al., 2023). No auxiliary classification heads, ranking losses, or other architectural changes are introduced.

4. Evaluation: Correctness and Explanation Quality

MentaLLaMA and its chat variants are evaluated on a 10-dataset IMHI-held-out benchmark comprising stress, depression, suicide risk, loneliness, stress causes, and risk factors. Evaluation metrics are:

  • Classification correctness: weighted F1F_1 (labels extracted from rationales via MentalBERT for non-templatic outputs).
  • Explanation quality: BART-score and human expert ratings.

Comparative results demonstrate that MentaLLaMA-chat-13B matches or is within 5 F1F_1 points of discriminative SOTA (MentalRoBERTa) on 7/10 tasks, and consistently outperforms T5/BART/ChatGPT generative baselines in explanation quality by 0.2–0.5 BART-score (Yang et al., 2023). Human scoring reflects high fluency and coherence, though a residual gap in “professionality” versus ChatGPT remains.

5. Empirical Performance Across Downstream Applications

MentaLLaMA serves as the generative engine in high-stakes clinical NLP pipelines:

  • In suicide risk evidence summarization, MentaLLaMA is combined with BERT-based risk extraction and phrase dictionaries. The integrated system achieves recall R=0.944R=0.944 and precision P=0.906P=0.906 for highlight extraction, ranking 1st for recall in CLPsych 2024 shared task (Tanaka et al., 2024).
  • In pragmatic reasoning benchmarks (P-ReMe), MentaLLaMA-7B underperforms more generalist instruction-tuned 7B LLMs (Mistral, Qwen) in agreement detection, implicature, and presupposition NLI, with maximum accuracy 0.52 (vs. Mistral/Qwen ≥0.90) and limited benefit from chain-of-thought prompting. This is attributed to possible overspecialization in surface empathy and data distribution mismatch (Oram et al., 31 Jul 2025).

6. Extensions, Critiques, and Best Practices

Domain-specialized instruction tuning enables state-of-the-art explainability, but limitations persist:

  • MentaLLaMA’s professionality lags models trained with explicit clinical corpora or additional retrieval (e.g., PHQ-9, psychiatry notes). Automatic metrics (e.g., BART-score) only moderately correlate with human judgment. Broader instruction-tuning curricula may improve pragmatic inference (Oram et al., 31 Jul 2025).
  • The architecture does not currently incorporate multi-modal context, longitudinal user modeling, or external retrieval. Future directions include continual pretraining on clinical guidelines and expansion to multi-modal and longitudinal signals (Yang et al., 2023).
  • Ethical guidelines stress that MentaLLaMA-derived assistants (e.g., Sólo Escúchame for Spanish emotional support) are only supplemental and not substitutes for licensed professionals (Ramírez et al., 2024).

7. Open Resources and Community Impact

All MentaLLaMA code, checkpoint weights, and the IMHI dataset are publicly released for transparency and reproducibility. This positions MentaLLaMA as a research platform for computational psychologists, early-warning public health tools, and large-scale mental health discourse monitoring—always with clear human-readable rationales (Yang et al., 2023).

References

  • [MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models, (Yang et al., 2023)]
  • [Integrating Supervised Extractive and Generative Language Models for Suicide Risk Evidence Summarization, (Tanaka et al., 2024)]
  • [P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication, (Oram et al., 31 Jul 2025)]
  • [Sólo Escúchame: Spanish Emotional Accompaniment Chatbot, (Ramírez et al., 2024)]

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MentaLLaMA.