Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Assessment on Comprehending Mental Health through Large Language Models (2401.04592v2)

Published 9 Jan 2024 in cs.CL
An Assessment on Comprehending Mental Health through Large Language Models

Abstract: Mental health challenges pose considerable global burdens on individuals and communities. Recent data indicates that more than 20% of adults may encounter at least one mental disorder in their lifetime. On the one hand, the advancements in LLMs have facilitated diverse applications, yet a significant research gap persists in understanding and enhancing the potential of LLMs within the domain of mental health. On the other hand, across various applications, an outstanding question involves the capacity of LLMs to comprehend expressions of human mental health conditions in natural language. This study presents an initial evaluation of LLMs in addressing this gap. Due to this, we compare the performance of Llama-2 and ChatGPT with classical Machine as well as Deep learning models. Our results on the DAIC-WOZ dataset show that transformer-based models, like BERT or XLNet, outperform the LLMs.

Introduction

Developments in the field of AI have seen LLMs such as ChatGPT offering a broad spectrum of capabilities, influencing various sectors. Notably, there's an intersection of interest in the utilization of this technology for mental health applications. Mental health concerns are critical, with over 20% of adults potentially facing a form of mental disorder in their lifetime. Not only is the impact personal, but the economic toll is also significant, with disorders such as depression and anxiety leading to substantial productivity losses worldwide.

Analysis of LLMs for Mental Health Applications

The paper in focus has embarked on evaluating the competency of two renowned LLMs, Llama-2 and ChatGPT, against conventional Machine Learning and Deep Learning models. The cornerstone of investigation is these models' ability to interpret and assess mental health conditions from conversational text data, specifically from a dataset known as DAIC-WOZ, which includes transcribed interviews focusing on psychological distress conditions. A noteworthy aspect of the paper is the use of the PHQ-4 questionnaire as a reference for structuring prompts to the LLMs, which inquires about patients' experiences related to anxiety and depression.

Methodology and Results

The methodology section details data preprocessing techniques and the evaluation of various models. The innovative prompting techniques employed for LLMs are outlined, with the aim of eliciting precise responses reflective of the PHQ-4 scores related to anxiety and depression. The paper reveals that traditional Transformer-based models, including BERT and XLNet, display superior performance in comparison to LLMs such as Llama-2 and ChatGPT when tasked with interpreting symptoms of mental health conditions.

Conclusion and Reflections

The research provides compelling evidence suggesting that while LLMs have impressive capabilities in language comprehension, there's room for growth within the domain of mental health assessment. Transformer models, which have been fine-tuned for this specific application, seem to outperform larger LLMs at this juncture. It's important to recognize the sensitive nature of the data and the complexity of mental health, which could present significant barriers to achieving unbiased performance by these models. Future work is suggested to involve a deeper exploration of LLMs, aiming to overcome these challenges and improve their application in mental health contexts. The findings are fundamental, elucidating the path forward in both technological development and ethical considerations in the intersection of AI and mental health support.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. An overview of the features of chatbots in mental health: A scoping review. International Journal of Medical Informatics 132 (2019), 103978. https://doi.org/10.1016/j.ijmedinf.2019.103978
  2. Multi-Task Learning for Mental Health using Social Media Text. arXiv:1712.03538 [cs.CL]
  3. Say ’YES’ to Positivity: Detecting Toxic Language in Workplace Communications. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 2017–2029. https://doi.org/10.18653/v1/2021.findings-emnlp.173
  4. Desiree Bill and Theodor Eriksson. 2023. Fine-tuning a LLM using Reinforcement Learning from Human Feedback for a Therapy Chatbot Application (Independent thesis Basic level, degree of Bachelor), KTH, School of Electrical Engineering and Computer Science (EECS).
  5. Assessing the Usability of a Chatbot for Mental Health Care. In Internet Science, Svetlana S. Bodrunova, Olessia Koltsova, Asbjørn Følstad, Harry Halpin, Polina Kolozaridi, Leonid Yuldashev, Anna Smoliarova, and Heiko Niedermayer (Eds.). Springer International Publishing, Cham, 121–132.
  6. Can AI Help Reduce Disparities in General Medical and Mental Health Care? AMA journal of ethics 21 2 (2019), E167–179. https://api.semanticscholar.org/CorpusID:73498305
  7. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
  8. Challenges of Large Language Models for Mental Health Counseling. arXiv:2311.13857 [cs.CL]
  9. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. Association for Computational Linguistics, Denver, Colorado, 31–39. https://doi.org/10.3115/v1/W15-1204
  10. Passive Diagnosis Incorporating the PHQ-4 for Depression and Anxiety. In Proceedings of the Social Media Mining for Health Applications (SMM4H) Workshop. Florence, Italy.
  11. First Insights on a Passive Major Depressive Disorder Prediction System with Incorporated Conversational Chatbot. In Irish Conference on Artificial Intelligence and Cognitive Science. Dublin, Ireland.
  12. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805 cite arxiv:1810.04805Comment: 13 pages.
  13. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
  14. Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences 115, 44 (2018), 11203–11208. https://doi.org/10.1073/pnas.1802331115 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1802331115
  15. The Capability of Large Language Models to Measure Psychiatric Functioning. arXiv:2308.01834 [cs.CL]
  16. The Distress Analysis Interview Corpus of human and computer interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Reykjavik, Iceland, 3123–3128. http://www.lrec-conf.org/proceedings/lrec2014/pdf/508_Paper.pdf
  17. Understanding and Measuring Psychological Stress using Social Media. ArXiv abs/1811.07430 (2018). https://api.semanticscholar.org/CorpusID:53717562
  18. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18 (2017), 43–49. https://doi.org/10.1016/j.cobeha.2017.07.005 Big data in the behavioural sciences.
  19. Multimodal mental health assessment with remote interviews using facial, vocal, linguistic, and cardiovascular patterns. medRxiv (2023). https://doi.org/10.1101/2023.09.11.23295212
  20. PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health. arXiv:2311.09189 [cs.CL]
  21. An ultra-brief screening scale for anxiety and depression: The PHQ-4.
  22. Designing a Chatbot as a Mediator for Promoting Deep Self-Disclosure to a Real Mental Health Professional. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 31 (may 2020), 27 pages. https://doi.org/10.1145/3392836
  23. RoBERTa: A Robustly Optimized BERT Pretraining Approach. http://arxiv.org/abs/1907.11692 cite arxiv:1907.11692.
  24. Towards automatic text-based estimation of depression through symptom prediction. Brain Informatics 10, 1 (2023), 4. https://doi.org/10.1186/s40708-023-00185-9
  25. Large Language Models in Neurology Research and Future Practice. Neurology 101, 23 (2023), 1058–1067. https://doi.org/10.1212/WNL.0000000000207967 arXiv:https://www.neurology.org/doi/pdf/10.1212/WNL.0000000000207967
  26. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL]
  27. Predicting Depression and Anxiety on Reddit: A Multi-Task Learning Approach. In Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Istanbul, Turkey) (ASONAM ’22). IEEE Press, 427–435. https://doi.org/10.1109/ASONAM55673.2022.10068655
  28. Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation. https://doi.org/10.31234/osf.io/cuzvr
  29. A Call to Action on Assessing and Mitigating Bias in Artificial Intelligence Applications for Mental Health. Perspectives on Psychological Science 18, 5 (Sept. 2023), 1062–1096. https://doi.org/10.1177/17456916221134490 Publisher Copyright: © The Author(s) 2022..
  30. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
  31. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
  32. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  33. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 1, Article 41 (mar 2021), 27 pages. https://doi.org/10.1145/3448107
  34. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. arXiv:2307.14385 [cs.CL]
  35. MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models. arXiv:2309.13567 [cs.CL]
  36. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Curran Associates Inc., Red Hook, NY, USA.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mihael Arcan (9 papers)
  2. Fionn Delahunty (2 papers)
  3. David-Paul Niland (2 papers)
Citations (2)