MentaLLaMA: Interpretable Mental Health LLM

Updated 27 January 2026

MentaLLaMA is an open-source large language model series tailored for mental health analysis that uses domain-specific instruction tuning to generate interpretable rationales.
It builds on LLaMA-2 architectures and is fine-tuned with the IMHI dataset, combining social media data to predict conditions like depression and suicide risk.
The model demonstrates strong performance in risk summarization and clinical NLP, achieving high recall and precision in evidence extraction for suicide risk assessment.

MentaLLaMA is an open-source LLM series tailored for interpretable mental health analysis on social media and other short-form text corpora. Building directly on LLaMA-2 architectures, it incorporates instruction tuning with expert-validated explanations, task labels, and psychological best practices, with the aim of generating both accurate predictions (e.g., depression, stress, suicide risk) and human-interpretable rationales. MentaLLaMA has been widely adopted for evidence extraction, risk summarization, and pragmatic reasoning in clinical NLP and forms the basis of downstream systems for suicide risk assessment and emotional support chatbots (Yang et al., 2023, Tanaka et al., 2024, Oram et al., 31 Jul 2025).

1. Model Architecture and Variants

MentaLLaMA is based on LLaMA-2, a Transformer decoder model with standard architectural hyperparameters for the 7B and 13B scales: 32–40 transformer layers, hidden sizes of 4,096–5,120, and approximately 32 attention heads. No architectural modifications are made; all improvements derive from domain- and task-specific instruction tuning (Yang et al., 2023, Tanaka et al., 2024). The main variants are:

Model Variant	Base	Size	Instruction Tuning
MentaLLaMA-7B	LLaMA2	7B	IMHI dataset
MentaLLaMA-chat-7B	LLaMA2-chat	7B	IMHI dataset
MentaLLaMA-chat-13B	LLaMA2-chat	13B	IMHI dataset

The pretraining corpus includes web text (e.g., CommonCrawl, The Pile) augmented with dialogue and code. The core pretraining objective is autoregressive maximum likelihood:

$L = -\sum_{i=1}^N \log p_\theta(x_i | x_{<i}).$

Instruction tuning on the interpretable mental health instruction (IMHI) dataset employs conditional text generation with cross-entropy objectives (Yang et al., 2023).

2. IMHI Dataset: Multi-Task, Multi-Source Foundation

The IMHI dataset underpins MentaLLaMA’s instruction tuning. It combines 10 existing corpora spanning 8 mental health analysis tasks (e.g., depression detection, stress, suicide risk, wellness dimensions, risk factors) on Reddit, Twitter, and SMS data. Altogether, the dataset comprises ∼105,000 (post, label, expert-prompt, explanation) instances. Each is generated using few-shot ChatGPT prompting grounded in expert-written templates. Labels and explanations cover a spectrum of disorders, causes, and risk factors (Yang et al., 2023).

Instructions and sample explanations were validated for correctness (agreement with gold labels), consistency (task-specific classifier accuracy), and quality (BART-score vs. gold rationales), with median expert evaluations ≥2.5/3 for consistency and ≥2.0/3 for reliability, professionality, and overall (Yang et al., 2023).

3. Instruction-Tuning and Optimization

MentaLLaMA is fine-tuned on IMHI via a conditional generation framework. The training objective minimizes

$\mathcal{L}(\phi) = - \sum_{(q,r)\in \mathcal{D}} \sum_{j=1}^{|r|} \log P_\phi(r_j | q, r_{<j}) + \lambda \|\phi\|_2^2$

where $q$ is the prompt (task+post), $r$ the rationale+answer, and $\lambda$ the regularization coefficient. Optimization uses AdamW, batch size 256, max sequence length 2048, and linear warmup. Hardware includes 4×A100 GPUs with Flash-Attention (Yang et al., 2023). No auxiliary classification heads, ranking losses, or other architectural changes are introduced.

4. Evaluation: Correctness and Explanation Quality

MentaLLaMA and its chat variants are evaluated on a 10-dataset IMHI-held-out benchmark comprising stress, depression, suicide risk, loneliness, stress causes, and risk factors. Evaluation metrics are:

Classification correctness: weighted $F_1$ (labels extracted from rationales via MentalBERT for non-templatic outputs).
Explanation quality: BART-score and human expert ratings.

Comparative results demonstrate that MentaLLaMA-chat-13B matches or is within 5 $F_1$ points of discriminative SOTA (MentalRoBERTa) on 7/10 tasks, and consistently outperforms T5/BART/ChatGPT generative baselines in explanation quality by 0.2–0.5 BART-score (Yang et al., 2023). Human scoring reflects high fluency and coherence, though a residual gap in “professionality” versus ChatGPT remains.

5. Empirical Performance Across Downstream Applications

MentaLLaMA serves as the generative engine in high-stakes clinical NLP pipelines:

In suicide risk evidence summarization, MentaLLaMA is combined with BERT-based risk extraction and phrase dictionaries. The integrated system achieves recall $R=0.944$ and precision $P=0.906$ for highlight extraction, ranking 1st for recall in CLPsych 2024 shared task (Tanaka et al., 2024).
In pragmatic reasoning benchmarks (P-ReMe), MentaLLaMA-7B underperforms more generalist instruction-tuned 7B LLMs (Mistral, Qwen) in agreement detection, implicature, and presupposition NLI, with maximum accuracy 0.52 (vs. Mistral/Qwen ≥0.90) and limited benefit from chain-of-thought prompting. This is attributed to possible overspecialization in surface empathy and data distribution mismatch (Oram et al., 31 Jul 2025).

6. Extensions, Critiques, and Best Practices

Domain-specialized instruction tuning enables state-of-the-art explainability, but limitations persist:

MentaLLaMA’s professionality lags models trained with explicit clinical corpora or additional retrieval (e.g., PHQ-9, psychiatry notes). Automatic metrics (e.g., BART-score) only moderately correlate with human judgment. Broader instruction-tuning curricula may improve pragmatic inference (Oram et al., 31 Jul 2025).
The architecture does not currently incorporate multi-modal context, longitudinal user modeling, or external retrieval. Future directions include continual pretraining on clinical guidelines and expansion to multi-modal and longitudinal signals (Yang et al., 2023).
Ethical guidelines stress that MentaLLaMA-derived assistants (e.g., Sólo Escúchame for Spanish emotional support) are only supplemental and not substitutes for licensed professionals (Ramírez et al., 2024).

7. Open Resources and Community Impact

All MentaLLaMA code, checkpoint weights, and the IMHI dataset are publicly released for transparency and reproducibility. This positions MentaLLaMA as a research platform for computational psychologists, early-warning public health tools, and large-scale mental health discourse monitoring—always with clear human-readable rationales (Yang et al., 2023).

References

[MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models, (Yang et al., 2023)]
[Integrating Supervised Extractive and Generative Language Models for Suicide Risk Evidence Summarization, (Tanaka et al., 2024)]
[P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication, (Oram et al., 31 Jul 2025)]
[Sólo Escúchame: Spanish Emotional Accompaniment Chatbot, (Ramírez et al., 2024)]

Markdown Upgrade to Chat

References (4)

MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models (2023)

Integrating Supervised Extractive and Generative Language Models for Suicide Risk Evidence Summarization (2024)

P-ReMIS: Pragmatic Reasoning in Mental Health and a Social Implication (2025)

Sólo Escúchame: Spanish Emotional Accompaniment Chatbot (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MentaLLaMA.

MentaLLaMA: Interpretable Mental Health LLM

1. Model Architecture and Variants

2. IMHI Dataset: Multi-Task, Multi-Source Foundation

3. Instruction-Tuning and Optimization

4. Evaluation: Correctness and Explanation Quality

5. Empirical Performance Across Downstream Applications

6. Extensions, Critiques, and Best Practices

7. Open Resources and Community Impact

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

MentaLLaMA: Interpretable Mental Health LLM

1. Model Architecture and Variants

2. IMHI Dataset: Multi-Task, Multi-Source Foundation

3. Instruction-Tuning and Optimization

4. Evaluation: Correctness and Explanation Quality

5. Empirical Performance Across Downstream Applications

6. Extensions, Critiques, and Best Practices

7. Open Resources and Community Impact

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research