Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language Models (2402.01712v1)

Published 25 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Suicidal ideation detection is a vital research area that holds great potential for improving mental health support systems. However, the sensitivity surrounding suicide-related data poses challenges in accessing large-scale, annotated datasets necessary for training effective machine learning models. To address this limitation, we introduce an innovative strategy that leverages the capabilities of generative AI models, such as ChatGPT, Flan-T5, and Llama, to create synthetic data for suicidal ideation detection. Our data generation approach is grounded in social factors extracted from psychology literature and aims to ensure coverage of essential information related to suicidal ideation. In our study, we benchmarked against state-of-the-art NLP classification models, specifically, those centered around the BERT family structures. When trained on the real-world dataset, UMD, these conventional models tend to yield F1-scores ranging from 0.75 to 0.87. Our synthetic data-driven method, informed by social factors, offers consistent F1-scores of 0.82 for both models, suggesting that the richness of topics in synthetic data can bridge the performance gap across different model complexities. Most impressively, when we combined a mere 30% of the UMD dataset with our synthetic data, we witnessed a substantial increase in performance, achieving an F1-score of 0.88 on the UMD test set. Such results underscore the cost-effectiveness and potential of our approach in confronting major challenges in the field, such as data scarcity and the quest for diversity in data representation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Understanding the complex phenomenon of suicide: from research to clinical practice. Frontiers in psychiatry, 9:61, 2018.
  2. E Rajesh Kumar and N Venkatram. Predicting and analyzing suicidal risk behavior using rule-based approach in twitter data. Soft Computing, ePub:1–9, 2023.
  3. Predicting genetic disorder and types of disorder using chain classifier approach. Genes, 14(1):71, 2022.
  4. Suicidal ideation detection on social media: A review of machine learning methods. arXiv preprint arXiv:2201.10515, 2022.
  5. Deep hierarchical ensemble model for suicide detection on imbalanced social media data. Entropy, 24(4):442, 2022.
  6. Identifying suicidal emotions on social media through transformer-based deep learning. Applied Intelligence, 53(10):11885–11917, 2023.
  7. Predicting divorce prospect using ensemble learning: Support vector machine, linear model, and neural network. Computational Intelligence and Neuroscience, 2022, 2022.
  8. Clinical text annotation–what factors are associated with the cost of time? In AMIA Annual Symposium Proceedings, volume 2018, page 1552. American Medical Informatics Association, 2018.
  9. Data scarcity, robustness and extreme multi-label classification. Machine Learning, 108(8-9):1329–1351, 2019.
  10. Sergey I Nikolenko. Synthetic data for deep learning, volume 174. Springer, 2021.
  11. Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062, 2023.
  12. Understanding the tradeoff between cost and quality of expert annotations for keyphrase extraction. In Proceedings of the 14th Linguistic Annotation Workshop, pages 74–86, 2020.
  13. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–602, Online, June 2021. Association for Computational Linguistics.
  14. Empowering language models with knowledge graph reasoning for open-domain question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9562–9581, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
  15. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychological bulletin, 143(2):187, 2017.
  16. Anxiety and its disorders as risk factors for suicidal thoughts and behaviors: A meta-analytic review. Clinical psychology review, 43:30–46, 2016.
  17. Understanding the complex of suicide in depression: from research to clinics. Psychiatry investigation, 17(3):207, 2020.
  18. Ned H Kalin. Insights into suicide and depression. Am J Psychiatry, pages 877–880, 2020.
  19. Risk factors for suicide in bipolar disorder: a systematic review. Journal of affective disorders, 170:237–254, 2015.
  20. Joel Paris. Suicidality in borderline personality disorder. Medicina, 55(6):223, 2019.
  21. Mental health, substance abuse, and suicide among homeless adults. Journal of evidence-informed social work, 14(4):229–242, 2017.
  22. A systematic review of interventions to prevent suicidal behaviors and reduce suicidal ideation in older people. International psychogeriatrics, 29(11):1801–1824, 2017.
  23. Examination of real-time fluctuations in suicidal ideation and its risk factors: Results from two ecological momentary assessment studies. Journal of abnormal psychology, 126(6):726, 2017.
  24. An overview of systematic reviews on the public health consequences of social isolation and loneliness. Public health, 152:157–171, 2017.
  25. Loneliness and social isolation as risk factors for mortality: a meta-analytic review. Perspectives on psychological science, 10(2):227–237, 2015.
  26. Limiting access to lethal means: applying the social ecological model for firearm suicide prevention. Injury prevention, 25(Suppl 1):i44–i48, 2019.
  27. The interpersonal theory of suicide. Psychological review, 117(2):575, 2010.
  28. A cognitive model of suicidal behavior: Theory and treatment. Applied and preventive psychology, 12(4):189–201, 2008.
  29. A social-ecological framework of theory, assessment, and prevention of suicide. Frontiers in psychology, 8:1756, 2017.
  30. Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Scientific reports, 8(1):7426, 2018.
  31. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Scientific reports, 12(1):15146, 2022.
  32. Detection of suicide-related posts in twitter data streams. IBM Journal of Research and Development, 62(1):7–1, 2018.
  33. Robert C Hsiung. A suicide in an online mental health support group: reactions of the group members, administrative responses, and recommendations. CyberPsychology & Behavior, 10(4):495–500, 2007.
  34. Tracking suicide risk factors through twitter in the us. Crisis, 2014.
  35. Analysing the connectivity and communication of suicidal users on twitter. Computer communications, 73:291–300, 2016.
  36. Chatgpt for suicide risk assessment on social media: Quantitative evaluation of model performance, potentials and limitations. arXiv preprint arXiv:2306.09390, 2023.
  37. On the evaluations of chatgpt and emotion-enhanced prompting for mental health analysis. arXiv preprint arXiv:2304.03347, 2023.
  38. # suicidal-a multipronged approach to identify and explore suicidal ideation in twitter. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 941–950, 2019.
  39. Knowledge-aware assessment of severity of suicide risk for early intervention. In The world wide web conference, pages 514–525, 2019.
  40. CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, June 2019.
  41. Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pages 25–36, 2018.
  42. Generate, annotate, and learn: NLP with synthetic text. Transactions of the Association for Computational Linguistics, 10:826–842, 2022.
  43. Mate-kd: Masked adversarial text, a companion to knowledge distillation. arXiv preprint arXiv:2105.05912, 2021.
  44. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2387–2392, 2022.
  45. Reasons for adolescent suicide attempts: Associations with psychological functioning. Journal of the American Academy of Child & Adolescent Psychiatry, 37(12):1287–1293, 1998.
  46. Suicide, suicide attempts, and suicidal ideation. Annual review of clinical psychology, 12:307–330, 2016.
  47. Factors associated with suicide ideation in adults. Social psychiatry and psychiatric epidemiology, 33:97–103, 1998.
  48. Predictive factors of suicidal ideation in spanish university students: a health, preventive, social, and cultural approach. Journal of clinical medicine, 12(3):1207, 2023.
  49. Prevalence of suicidal ideation and associated risk factors in the general population. Journal of the Formosan Medical Association, 109(2):138–147, 2010.
  50. Depression and suicidal ideation in college students. Psychopathology, 45(4):228–234, 2012.
  51. Unimaginable loss: contingent suicidal ideation in family members of oncology patients. Psychosomatics, 51(2):166–170, 2010.
  52. Suicide in immigrants: An overview. 2013.
  53. Joseph D Hovey. Acculturative stress, depression, and suicidal ideation in mexican immigrants. Cultural Diversity and Ethnic Minority Psychology, 6(2):134, 2000.
  54. Gendered racial microaggressions, internalized racism, and suicidal ideation among emerging adult asian american women. International journal of social psychiatry, 69(2):342–350, 2023.
  55. Attention is all you need. Advances in neural information processing systems, 30:5–8, 2017.
  56. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.
  57. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409, 2022.
  58. From discrimination to generation: Knowledge graph completion with generative transformer. In Companion Proceedings of the Web Conference 2022, pages 162–165, 2022.
  59. Pangu-bot: Efficient generative dialogue pre-training from pre-trained language model. arXiv preprint arXiv:2203.17090, 2022.
  60. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  61. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  62. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  63. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
  64. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
  65. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  66. A systematic analysis of performance measures for classification tasks. Information processing & management, 45(4):427–437, 2009.
  67. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets