Interpretable Mental Health Instruction Dataset

Updated 27 January 2026

IMHI is a rigorously curated dataset that pairs mental health texts with structured instructions, expert labels, and natural language explanations for interpretability.
It supports multi-task analysis including binary detection, multi-label categorization, causal extraction, and counseling dialogue evaluation across multiple languages.
It enhances clinical trust by providing interpretable benchmarks and evaluation metrics, facilitating robust model validation and safe real-world deployment.

The Interpretable Mental Health Instruction (IMHI) dataset collectively refers to a family of rigorously curated resources that advance mental health analysis by pairing textual samples with structured instructions, expert-annotated labels, and natural-language decision rationales. IMHI datasets are foundational to the development, instruction tuning, and benchmarking of LLMs for explainable, robust, and clinically relevant mental health inference on both social media and real-world clinical data. These datasets are designed to address the limitations of traditional opaque deep learning models by enabling not only high-accuracy multi-task classification but also providing interpretability essential for clinical trust, model validation, and safe deployment (Yang et al., 2023, Zhai et al., 2024, Yang et al., 2023, Lamparth et al., 22 Feb 2025, Garg et al., 2023, Chen et al., 5 Mar 2025, Haider et al., 22 Jun 2025).

1. Dataset Scope and Motivations

IMHI is engineered for instruction-based learning and explainable benchmarking in mental health NLP. Its core objectives are:

Enabling multi-task analysis of mental health–related text, spanning binary detection (e.g., depression, suicide risk), multi-class/multi-label syndrome/disorder categorization, cause/factor and cognitive pathway extraction, and multi-turn counseling strategies.
Pairing every inference with interpretable, human-style natural language explanations, either as model-generated rationales or annotator-extracted rationales, allowing downstream validation, error analysis, and clinical application.
Providing a transparent testbed for evaluating the reasoning, calibration, and bias of LLMs in diverse and intersectional mental health contexts, with explicit support for multilingual and cross-cultural studies (Yang et al., 2023, Yang et al., 2023, Chen et al., 5 Mar 2025, Zhai et al., 2024).

Interpretability is emphasized to bridge the trust gap in deep learning, to facilitate clinical and research auditing of AI decisions, to support error analysis, and to advance bias mitigation and fairness assessment (Haider et al., 22 Jun 2025).

2. Composition, Task Spectrum, and Formalization

IMHI resources decompose into several major multilingual and task-diverse datasets, the largest totaling over 100,000 instruction–response pairs (Yang et al., 2023). Key instantiations include:

Data Sources and Tasks

Dataset/Source	Tasks/Labels	Language	# Samples
MentaLLaMA-IMHI (Yang et al., 2023)	8 tasks (binary/multi-class detection, causes, IRF, etc.)	EN, CH, others	105,794
C-IMHI (Zhai et al., 2024)	Suicide risk (binary), cognitive distortion (12-way ML), cognitive pathway (4+19 hierarchical ML)	Chinese, EN (CP only)	9,251
IMHI small (Yang et al., 2023)	5 task families (binary, multi-class, causal, ERC, CEE)	English	163 expl.
IRF/Reddit (Garg et al., 2023)	Interpersonal risk factors (TBE/PBU), extractive rationale	English	3,522
Psy-Insight (Chen et al., 5 Mar 2025)	Counseling dialogues (emotion, strategy, therapy rationale)	Chinese, English	12,000+

Formal Annotation Format

A generic IMHI dataset $D$ is a collection of $(I_i, O_i)$ pairs:

$D = \{ (I_i, O_i) \}_{i=1}^N$

Where each instruction consists of a detailed task description, target text (e.g., social media post or dialogue), and explicit query; each output includes structured labels (task-dependent) and a natural-language explanation. Functional fields (as per C-IMHI):

$d_i$ : task description
$t_i$ : text/post
$q_i$ : instruction prompt
$c_i \in Y$ : label(s) for task $Y$ (class, multi-label, etc.)
$e_i$ : explanation

Representative JSON schema:

$(I_i, O_i)$ 6 (Zhai et al., 2024, Yang et al., 2023).

3. Annotation Protocols and Quality Assurance

IMHI datasets are engineered for high annotator agreement and reliability, following distinctive protocols for different domains:

Expert protocols: Annotation by trained clinicians, psychologists, or advanced domain PhDs; peer or group review to resolve conflicts (Lamparth et al., 22 Feb 2025, Garg et al., 2023).
Automated filtering: For large-scale, LLM-generated explanations, post hoc correctness (label match), and consistency (classifier-based) checks (Yang et al., 2023, Zhai et al., 2024).
Human validation: Random sampling with multi-axis (Consistency, Reliability, Professionality) human scoring on a 0–3 scale. Consistency ≈2.73, reliability ≈2.30, professionality ≈2.03 for C-IMHI; similar metrics for MentaLLaMA and others (Zhai et al., 2024, Yang et al., 2023).
Inter-annotator Agreement: Cohen’s κ, Fleiss’ κ, or Krippendorff’s α applied where appropriate. For IRF: κ(TBE)=0.7883, κ(PBU)=0.8239 (Garg et al., 2023). For IMHI-small: κ≥0.41 for most human explanation ratings (Yang et al., 2023).
Uncertainty estimation: For ambiguous questions, preference probabilities via hierarchical Bradley–Terry models, soft label distribution, and confidence intervals (Lamparth et al., 22 Feb 2025).

Explanations may be LLM-generated (validated against expert-written few-shot templates and spot-checked), or extractive/highlighted rationales for clinical interpretability (Zhai et al., 2024, Yang et al., 2023, Garg et al., 2023).

4. Instruction Structure and Representative Examples

IMHI samples couple detailed instructions, test samples, gold labels, and rationales. Examples:

Binary:
- “请判断下面的微博是否表示高自杀风险，并解释理由。”
- Label: “high_risk”
- Explanation: “‘结束这一切’直接表明自杀意图，且语气坚定，故属于高自杀风险。”
Multi-label:
- “请识别这条微博中包含的所有认知扭曲类型，并解释原因。”
- Label: [“过度概括”, “灾难化”]
- Explanation: “‘每次…都’属于过度概括；‘一辈子都没希望’是灾难化思维。”
Cognitive Pathway:
- Extract four parent and nineteen child pathway elements hierarchically, with rationale for each mapping.
Counseling Dialogues:
- Each therapist turn: {(utterance, strategy_label, reasoning)}, each client turn: {(utterance, emotion_label, observation)}, with session-level guidance summaries (Chen et al., 5 Mar 2025).

Such instructive and explanation-rich formatting enables modeling both human-like reasoning and fine-grained multi-task capabilities.

5. Evaluation Protocols and Benchmarks

Standard IMHI evaluation is multi-level:

Classification Metrics: Micro-averaged Precision, Recall, and $F_1$ ; for multi-class, weighted $(I_i, O_i)$ 0 as

$(I_i, O_i)$ 1

(Zhai et al., 2024, Yang et al., 2023).

Explanation Quality: BART-Score, BLEU, ROUGE-L, BERTScore, and human evaluations (0–3 scales for fluency, completeness, reliability) (Yang et al., 2023, Yang et al., 2023).
Fairness/Bias Auditing: For intersectional bias, $(I_i, O_i)$ 2, $(I_i, O_i)$ 3, $(I_i, O_i)$ 4 computed by groupwise comparison, and bias amplification factors $(I_i, O_i)$ 5 through multi-hop QA (Haider et al., 22 Jun 2025).
Clinical/ambiguous tasks: Use of soft label calibration scores (Brier, cross-entropy), ambiguity metrics (Krippendorff’s α; BT probabilities) (Lamparth et al., 22 Feb 2025).

IMHI-tuned LLMs (e.g., MentalLLaMA, MentalGLM) are benchmarked against BERT-family PLMs, ChatGPT, GPT-4, and open-source LLMs, showing competitive correctness (F1), high explanation quality, and, in some settings, superior performance to zero/few-shot LLMs (Yang et al., 2023, Zhai et al., 2024).

6. Applications and Impact

IMHI frameworks underpin a variety of research and applied efforts:

Instruction/LLM Tuning: MentaLLaMA, MentalGLM, and related open-source LLMs for Chinese and English are instruction-tuned on IMHI, improving explainability and robustness (Yang et al., 2023, Zhai et al., 2024).
Clinical and Social Media Use: Addressing a wide landscape of conditions and explanatory needs—suicide risk, depression, cognitive distortions, counseling dynamics—and supporting both academic and clinical validation (Garg et al., 2023, Chen et al., 5 Mar 2025).
AI Bias Auditing: Systematic, intersectional audit of LLM outputs for amplification/silencing of marginalized perspectives (Haider et al., 22 Jun 2025).
Evaluation Tooling: IMHI’s format allows prompt engineering experiments, automatic metric development, and alignment evaluation.
Human-in-the-loop Use: Interpretability and explanation generation facilitate safe integration into triage, early warning, documentation, and research settings (Zhai et al., 2024, Lamparth et al., 22 Feb 2025).

7. Limitations and Prospective Extensions

Known limitations:

Scarcity of published κ or α metrics for all tasks/languages; several splits rely partially on LLM-simulated expert explanation, which may underperform in "professionality" compared to consistency (Zhai et al., 2024, Yang et al., 2023).
Current datasets are skewed toward English and Chinese, with limited cross-lingual coverage (Chen et al., 5 Mar 2025).
Clinically validated reasoning quality and calibration remain active research areas, particularly for domain transfer and real-world deployment (Zhai et al., 2024, Lamparth et al., 22 Feb 2025).

Future work includes RLHF-enhanced instruction finetuning, fuller localization and multi-language expansion, longitudinal and platform-diverse data captures, integration of structured clinical instruments, and deployment in real-time human–AI collaboration scenarios (Zhai et al., 2024, Yang et al., 2023, Chen et al., 5 Mar 2025, Haider et al., 22 Jun 2025).

The IMHI datasets and their derivatives constitute the principal shared research infrastructure for explainable, instruction-tuned NLP in mental health across clinical and social domains, providing essential resources for advancing model development, reliability, interpretability, and equity in mental health AI (Yang et al., 2023, Zhai et al., 2024, Yang et al., 2023, Lamparth et al., 22 Feb 2025, Garg et al., 2023, Chen et al., 5 Mar 2025, Haider et al., 22 Jun 2025).