Your Model Is Not Predicting Depression Well And That Is Why: A Case Study of PRIMATE Dataset (2403.00438v1)
Abstract: This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts. While previous research relies on social media-based datasets annotated with binary categories, i.e. depressed or non-depressed, recent datasets such as D2S and PRIMATE aim for nuanced annotations using PHQ-9 symptoms. However, most of these datasets rely on crowd workers without the domain knowledge for annotation. Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through reannotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations, to be released under a Data Use Agreement, offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments.
- American Psychiatric Association. 2013. Diagnostic and statistical manual of mental disorders: DSM-5™ (5th ed.). American Psychiatric Publishing, Inc.
- Ethical research protocols for social media health research. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pages 94–102, Valencia, Spain. Association for Computational Linguistics.
- Quantifying mental health signals in Twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, pages 51–60, Baltimore, Maryland, USA. Association for Computational Linguistics.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Learning to automate follow-up question generation using process knowledge for depression triage on Reddit posts. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, pages 137–147, Seattle, USA. Association for Computational Linguistics.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.
- Kurt Kroenke and Robert L Spitzer. 2002. The PHQ-9: a new depression diagnostic and severity measure.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- David E Losada and Fabio Crestani. 2016. A test collection for research on depression and language use. In International conference of the cross-language evaluation forum for European languages, pages 28–39. Springer.
- Stuart A Montgomery and MARIE Åsberg. 1979. A new depression scale designed to be sensitive to change. The British journal of psychiatry, 134(4):382–389.
- Seon-Cheol Park and Daeho Kim. 2020. The centrality of depression and anxiety symptoms in major depressive disorder determined using a network analysis. Journal of affective disorders, 271:19–26.
- Inna Pirina and Çağrı Çöltekin. 2018. Identifying depression on Reddit: The effect of training data. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, pages 9–12, Brussels, Belgium. Association for Computational Linguistics.
- Assessing anhedonia in depression: Potentials and pitfalls. Neuroscience & Biobehavioral Reviews, 65:21–35.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- Study on mental disorder detection via social media mining. In 2019 4th International conference on computing, communications and security (ICCCS), pages 1–6. IEEE.
- University of Tartu. 2018. UT rocket.
- Association of symptom network structure with the course of depression. JAMA psychiatry, 72(12):1219–1226.
- Understanding anhedonia: A qualitative study exploring loss of interest and pleasure in adolescent depression. European Child & Adolescent Psychiatry, 29:489–499.
- Mapping the relationship between anxiety, anhedonia, and depression. Journal of affective disorders, 221:289–296.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Identifying depressive symptoms from tweets: Figurative language enabled multitask learning framework. In Proceedings of the 28th International Conference on Computational Linguistics, pages 696–709, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2968–2978, Copenhagen, Denmark. Association for Computational Linguistics.
- Ayah Zirikly and Mark Dredze. 2022. Explaining models of mental health via clinically grounded auxiliary tasks. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, pages 30–39, Seattle, USA. Association for Computational Linguistics.
- Kirill Milintsevich (6 papers)
- Kairit Sirts (24 papers)
- Gaël Dias (5 papers)