Towards Interpretable Mental Health Analysis with Large Language Models
Abstract: The latest LLMs such as ChatGPT, exhibit strong capabilities in automated mental health analysis. However, existing relevant studies bear several limitations, including inadequate evaluations, lack of prompting strategies, and ignorance of exploring LLMs for explainability. To bridge these gaps, we comprehensively evaluate the mental health analysis and emotional reasoning ability of LLMs on 11 datasets across 5 tasks. We explore the effects of different prompting strategies with unsupervised and distantly supervised emotional information. Based on these prompts, we explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions. We convey strict human evaluations to assess the quality of the generated explanations, leading to a novel dataset with 163 human-assessed explanations. We benchmark existing automatic evaluation metrics on this dataset to guide future related works. According to the results, ChatGPT shows strong in-context learning ability but still has a significant gap with advanced task-specific methods. Careful prompt engineering with emotional cues and expert-written few-shot examples can also effectively improve performance on mental health analysis. In addition, ChatGPT generates explanations that approach human performance, showing its great potential in explainable mental health analysis.
- Transfer learning for depression: Early detection and severity prediction from social media postings. In CLEF (Working Notes).
- Will affective computing emerge from foundation models and general ai? a first evaluation on chatgpt. arXiv preprint arXiv:2303.03186.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- James RA Benoit. 2023. Chatgpt for clinical vignette generation, revision, and evaluation. medRxiv, pages 2023–02.
- Ethical research protocols for social media health research. In Proceedings of the first ACL workshop on ethics in natural language processing, pages 94–102.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Amy Bruckman. 2002. Studying the amateur artist: A perspective on disguising data collected in human subjects research on the internet. Ethics and Information Technology, 4:217–231.
- IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Evaluation, 42(4):335–359.
- How robust is gpt-3.5 to predecessors? a comprehensive study on language understanding tasks. arXiv preprint arXiv:2303.00293.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP.
- Clpsych 2015 shared task: Depression and ptsd on twitter. In Proceedings of the 2nd Workshop on CLPsych, pages 31–39.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Msˆ2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7494–7513.
- Statistical methods for rates and proportions. john wiley & sons.
- Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6387–6396.
- Hierarchical attention network for explainable depression detection on Twitter aided by metaphor concept mappings. In Proceedings of the 29th International Conference on Computational Linguistics, pages 94–104, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, volume 8, pages 216–225.
- Shaoxiong Ji. 2022. Towards intention understanding in suicidal risk assessment with natural language processing. In Findings of EMNLP, pages 4028–4038.
- Suicidal ideation and mental disorder detection with attentive relation networks. Neural Computing and Applications, 34:10309–10319.
- Mentalbert: Publicly available pretrained language models for mental healthcare. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7184–7190. European Language Resources Association (ELRA).
- Detection of mental health from Reddit via deep contextualized representations. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pages 147–156, Online. Association for Computational Linguistics.
- Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745.
- Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431.
- Suicidal ideation and the subjective aspects of depression. Journal of affective disorders, 140(1):75–81.
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
- Chatgpt: Jack of all trades, master of none. arXiv preprint arXiv:2302.10724.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
- Bishal Lamichhane. 2023. Evaluation of chatgpt for nlp-based mental health applications. arXiv preprint arXiv:2303.15727.
- Neutral utterances are also causes: Enhancing conversational causal emotion entailment with social commonsense knowledge. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 4209–4215. ijcai.org.
- DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Sensemood: depression detection on social media. In Proceedings of the 2020 international conference on multimedia retrieval, pages 407–411.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Chatgpt as a factual inconsistency evaluator for abstractive text summarization. arXiv preprint arXiv:2303.15621.
- Dialoguernn: An attentive RNN for emotion detection in conversations. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 6818–6825. AAAI Press.
- Sad: A stress annotated dataset for recognizing everyday stressors in sms-like conversational systems. In Extended abstracts of the 2021 CHI conference on human factors in computing systems, pages 1–7.
- Saif Mohammad and Peter Turney. 2010. Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pages 26–34, Los Angeles, CA. Association for Computational Linguistics.
- Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3):436–465.
- Improving the generalizability of depression detection by leveraging clinical questionnaires. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8446–8459, Dublin, Ireland. Association for Computational Linguistics.
- Ethics and privacy in social media research for mental health. Current psychiatry reports, 22:1–7.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Inna Pirina and Çağrı Çöltekin. 2018. Identifying depression on Reddit: The effect of training data. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, pages 9–12.
- Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 873–883, Vancouver, Canada. Association for Computational Linguistics.
- MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536, Florence, Italy. Association for Computational Linguistics.
- Recognizing emotion cause in conversations. Cogn. Comput., 13(5):1317–1332.
- Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access, 7:100943–100953.
- Cornelius Puschman. 2017. Bad judgment, bad ethics? Internet Research Ethics for the Social Age, 95.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Multimodal fusion of bert-cnn and gated cnn representations for depression detection. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, pages 55–63.
- Dialogxl: All-in-one xlnet for multi-party conversation emotion recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13789–13797.
- Directed acyclic graph network for conversational emotion recognition. In ACL, pages 1551–1560. Association for Computational Linguistics.
- Supervised prototypical contrastive learning for emotion recognition in conversation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5197–5206, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021.
- Audibert: A deep transfer learning multimodal classification framework for depression screening. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 4145–4154.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Elsbeth Turcan and Kathleen McKeown. 2019. Dreaddit: A Reddit Dataset for Stress Analysis in Social Media. In Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pages 97–107.
- Generating (factual?) narrative summaries of rcts: Experiments with neural multi-document summarization. AMIA Summits on Translational Science Proceedings, 2021:605.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- Effective inter-clause modeling for end-to-end emotion-cause pair extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3171–3181, Online. Association for Computational Linguistics.
- Knowledge-interactive network with sentiment polarity intensity-aware multi-task learning for emotion recognition in conversations. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2879–2889, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cluster-level contrastive learning for emotion recognition in conversations. IEEE Transactions on Affective Computing, pages 1–12.
- A mental state knowledge–aware and contrastive network for early stress and depression detection on social media. Information Processing & Management, 59(4):102961.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
- A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. arXiv preprint arXiv:2303.10420.
- Zero-shot temporal relation extraction with chatgpt. arXiv preprint arXiv:2304.05454.
- Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277.
- Sayyed M Zahiri and Jinho D Choi. 2017. Emotion detection on tv show transcripts with sequence-based convolutional neural networks. arXiv preprint arXiv:1708.04299.
- Natural language processing applied to mental illness detection: a narrative review. NPJ digital medicine, 5(1):46.
- Emotion fusion for mental illness detection from social media: A survey. Information Fusion, 92:231–246.
- Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Psychiatric scale guided risky post screening for early detection of depression. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 5220–5226.
- Knowledge-bridged causal interaction network for causal emotion entailment. arXiv preprint arXiv:2212.02995.
- Knowledge-enriched transformer for emotion detection in textual conversations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 165–176, Hong Kong, China. Association for Computational Linguistics.
- Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198.
- A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630.
- Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), pages 207–212.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.