Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media (2307.02313v2)

Published 5 Jul 2023 in cs.CL

Abstract: In this work, we present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Given that synthetic data provided by LLMs have been proven to be a reliable method for augmenting data and fine-tuning downstream models, we chose to generate synthetic data using ChatGPT for each of the symptoms of the BDI-II questionnaire. We designed a prompt such that the generated data contains more richness and semantic diversity than the BDI-II responses for each question and, at the same time, contains emotional and anecdotal experiences that are specific to the more intimate way of sharing experiences on Reddit. We perform semantic search and rank the sentences' relevance to the BDI-II symptoms by cosine similarity. We used two state-of-the-art transformer-based models (MentalRoBERTa and a variant of MPNet) for embedding the social media posts, the original and generated responses of the BDI-II. Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data. Furthermore, the generated synthetic data were proved too specific for this task, the approach simply relying on the BDI-II responses had the best performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Prevalence and impact of diagnosed and undiagnosed depression in the united states, Cureus 14 (2022).
  2. Center for epidemiologic studies depression scale: Review and revision, The use of psychological testing for treatment planning and outcomes assessment (2004).
  3. The phq-9: validity of a brief depression severity measure, Journal of general internal medicine 16 (2001) 606–613.
  4. Beck depression inventory–ii, Psychological assessment (1996).
  5. M. Hamilton, A rating scale for depression, Journal of neurology, neurosurgery, and psychiatry 23 (1960) 56.
  6. M. De Choudhury, S. De, Mental health discourse on reddit: Self-disclosure, social support, and anonymity, in: Proceedings of ICWSM, volume 8, 2014, pp. 71–80.
  7. Early risk detection of pathological gambling, self-harm and depression using bert, in: CLEF (Working Notes), 2021.
  8. Depression and self-harm risk assessment in online forums, in: Proceedings of EMNLP, 2017, pp. 2968–2978.
  9. Symptom identification for interpretable detection of multiple mental disorders on social media, in: Proceedings of EMNLP, 2022a, pp. 9970–9985.
  10. Psychiatric scale guided risky post screening for early detection of depression, in: Proceedings of IJCAI, 2022b.
  11. Language models are few-shot learners, Proceedings of NeurIPS 33 (2020) 1877–1901.
  12. OpenAI, Gpt-4 technical report, arXiv (2023).
  13. Do we still need human assessors? prompt-based gpt-3 user simulation in conversational ai, in: Proceedings of CUI, 2022, pp. 1–6.
  14. Chataug: Leveraging chatgpt for text data augmentation, arXiv preprint arXiv:2302.13007 (2023).
  15. Mgl-cnn: a hierarchical posts representations model for identifying depressed individuals in online forums, IEEE Access 8 (2020) 32395–32403.
  16. Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences, IEEE Transactions on Knowledge and Data Engineering 32 (2018) 588–601.
  17. R. Skaik, D. Inkpen, Using twitter social media for depression detection in the canadian population, in: Proceedings of AICCC, 2020, pp. 109–114.
  18. A.-S. Uban, P. Rosso, Deep learning architectures and strategies for early detection of self-harm and depression level prediction, in: CLEF (Working Notes), volume 2696, 2020, pp. 1–12.
  19. Multi-aspect transfer learning for detecting low resource mental disorders on social media, in: Proceedings of LREC, 2022, pp. 3202–3219.
  20. Towards preemptive detection of depression and anxiety in twitter, in: Proceedings of SMM4H Workshop, 2020, pp. 82–89.
  21. Early risk detection of self-harm and depression severity using bert-based transformers: ilab at clef erisk 2020 (2020).
  22. It’s just a matter of time: Detecting depression with time-enriched multimodal transformers, in: J. Kamps, L. Goeuriot, F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances in Information Retrieval, Springer Nature Switzerland, Cham, 2023, pp. 200–215.
  23. Improving the generalizability of depression detection by leveraging clinical questionnaires, in: Proceedings of ACL, 2022, pp. 8446–8459.
  24. Detecting symptoms of depression on reddit, in: Proceedings of WebSci, 2023, pp. 174–183.
  25. On the evaluations of chatgpt and emotion-enhanced prompting for mental health analysis, arXiv preprint arXiv:2304.03347 (2023).
  26. Will affective computing emerge from foundation models and general ai? a first evaluation on chatgpt, IEEE Intelligent Systems 38 (2023) 2.
  27. Personachatgen: Generating personalized dialogues using gpt-3, in: Proceedings of CCGPK Workshop, 2022, pp. 29–48.
  28. Zeroshotdataaug: Generating and augmenting training data with chatgpt, arXiv preprint arXiv:2304.14334 (2023).
  29. Overview of erisk 2023: Early risk prediction on the internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 14th International Conference of the CLEF Association, CLEF 2023, Springer International Publishing, Thessaloniki, Greece, 2023.
  30. Self-instruct: Aligning language model with self generated instructions, arXiv preprint arXiv:2212.10560 (2022).
  31. Mpnet: Masked and permuted pre-training for language understanding, Proceedings of NeurIPS 33 (2020) 16857–16867.
  32. Mentalbert: Publicly available pretrained language models for mental healthcare, in: Proceedings of LREC, 2022, pp. 7184–7190.
  33. Towards intelligent clinically-informed language analyses of people with bipolar disorder and schizophrenia, in: Findings of EMNLP, 2022, pp. 2871–2887.
  34. Enabling early health care intervention by detecting depression in users of web-based forums using language models: Longitudinal analysis and evaluation, JMIR AI 2 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Ana-Maria Bucur (17 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.