Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams (1905.08772v2)

Published 18 May 2019 in cs.CY, cs.CL, cs.IR, cs.LG, and cs.SI

Abstract: With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Detecting early risk of depression from social media user-generated content. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  2. American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub.
  3. Pyss3: A python package implementing a novel text classifier with visualization tools for explainable ai. arXiv preprint arXiv:1912.09322, .
  4. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference (pp. 47–56). ACM.
  5. Predicting depression via social media. ICWSM, 13, 1–10.
  6. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113.
  7. Text classification: a sequential reading approach. In European Conference on Information Retrieval (pp. 411–423). Springer.
  8. Early text classification: a Naive solution. In Proceedings of NAACL-HLT (pp. 91–99). Association for Computational Linguistics.
  9. Early detection of deception and aggressiveness using profile-based representations. Expert Systems with Applications, 89, 99–111.
  10. Uach-inaoe participation at erisk2017. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  11. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences, 18, 43 – 49. Big data in the behavioural sciences.
  12. Iskandar, B. S. (2017). Terrorism detection based on sentiment analysis using machine learning. Journal of Engineering and Applied Sciences, 12, 691–698.
  13. Rumor detection over varying time windows. PloS one, 12, e0168344.
  14. Early text classification using multi-resolution concept representations. In The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL HLT.
  15. A test collection for research on depression and language use. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 28–39). Springer.
  16. erisk 2017: Clef lab on early risk prediction on the internet: Experimental foundations. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 346–360). Springer.
  17. Learning when to classify for early text classification. Revised Selected Papers. Communications in Computer and Information Science (CCIS), Springer, 790, 24–34.
  18. Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2, 159–165.
  19. Detecting rumors from microblogs with recurrent neural networks. In IJCAI (pp. 3818–3824).
  20. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1751–1754). ACM.
  21. Irit at e-risk. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  22. Predicting depression: a comparative study of machine learning approaches based on language usage. Panamerican Journal of Neuropsychology, 11.
  23. National Center for Health Statistics (2019). Mortality in the United States, 2017. https://www.cdc.gov/nchs/products/databriefs/db328.htm. [Online; accessed 13-April-2019].
  24. Powers, D. M. (1998). Applications and explanations of zipf’s law. In Proceedings of the joint conferences on new methods in language processing and computational natural language learning (pp. 151–160). Association for Computational Linguistics.
  25. Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18, 1121–1133.
  26. Uarizona at the clef erisk 2017 pilot task: Linear and recurrent models for early depression detection. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  27. Data-driven content analysis of social media: a systematic overview of automated methods. The ANNALS of the American Academy of Political and Social Science, 659, 78–94.
  28. Reasonet: Learning to stop reading in machine comprehension. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1047–1055). ACM.
  29. Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic medicine, 63, 517–522.
  30. Linguistic metadata augmented classifiers at the clef 2017 task for early detection of depression. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  31. Recognizing depression from twitter activity. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI 2015, Seoul, Republic of Korea, April 18-23, 2015 (pp. 3187–3196).
  32. Uam’s participation at clef erisk 2017 task: Towards modelling depressed bloggers. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  33. Lidic - unsl’s participation at erisk 2017: Pilot task on early detection of depression. In Proceedings Conference and Labs of the Evaluation Forum CLEF.
  34. World Health Organization (2014). Preventing suicide: a global imperative. WHO.
  35. World Health Organization (2017). Depression and other common mental disorders: global health estimates. WHO.
  36. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12, 40–48.
  37. Learning to Skim Text. ArXiv e-prints, . arXiv:1704.06877.
  38. Fast and accurate text classification: Skimming, rereading and early stopping. In ICLR 2018 Workshop. URL: https://openreview.net/forum?id=ryZ8sz-Ab.
  39. Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Addison-Wesley.
Citations (153)

Summary

We haven't generated a summary for this paper yet.