Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LocalTweets to LocalHealth: A Mental Health Surveillance Framework Based on Twitter Data (2402.13452v2)

Published 21 Feb 2024 in cs.SI, cs.CL, and cs.LG

Abstract: Prior research on Twitter (now X) data has provided positive evidence of its utility in developing supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups) in the USA. We pair these tweets from each neighborhood with the corresponding MH outcome reported by the Center for Disease Control (CDC) to create a benchmark dataset, LocalTweets. With LocalTweets, we present the first population-level evaluation task for Twitter-based MH surveillance systems. We then develop an efficient and effective method, LocalHealth, for predicting MH outcomes based on LocalTweets. When used with GPT3.5, LocalHealth achieves the highest F1-score and accuracy of 0.7429 and 79.78\%, respectively, a 59\% improvement in F1-score over the GPT3.5 in zero-shot setting. We also utilize LocalHealth to extrapolate CDC's estimates to proxy unreported neighborhoods, achieving an F1-score of 0.7291. Our work suggests that Twitter data can be effectively leveraged to simulate neighborhood-level MH outcomes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Filtering entities to optimize identification of adverse drug reaction from social media: how can the number of words between entities in the messages help? JMIR public health and surveillance, 3(2):e6577.
  2. Long short-term memory–based prediction of the spread of influenza-like illness leveraging surveillance, weather, and twitter data: Model development and validation. Journal of Medical Internet Research, 25:e42519.
  3. Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
  4. Mental health surveillance among children—united states, 2013–2019. MMWR supplements, 71(2):1.
  5. Validating machine learning algorithms for twitter data against established measures of suicidality. JMIR mental health, 3(2):e4822.
  6. National and local influenza surveillance through twitter: an analysis of the 2012-2013 influenza epidemic. PloS one, 8(12):e83672.
  7. United States Census Bureau. 2023a. Census blocks and block groups.
  8. United States Census Bureau. 2023b. Statistical groupings of states and counties.
  9. Chih-Chung Chang and Chih-Jen Lin. 2011. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27.
  10. Quantifying mental health signals in twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality, pages 51–60.
  11. From adhd to sad: Analyzing the language of mental health on twitter through self-reported diagnoses. In Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pages 1–10.
  12. Aron Culotta. 2014. Estimating county health statistics with twitter. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 1335–1344.
  13. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems, pages 2098–2110.
  14. Gender and cross-cultural differences in social media disclosures of mental illness. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing, pages 353–369.
  15. Honey, i shrunk the language: Language model behavior at reduced scale. arXiv preprint arXiv:2305.17266.
  16. Relative food insecurity, mental health and wellbeing in 160 countries. Social science & medicine, 268:113556.
  17. Center for Disease Control. 2023. Health status metrics.
  18. Places: local data for better health.
  19. Center for Health Disparities Research. 2023. Neighborhood atlas.
  20. The remarkable benefit of user-level aggregation for lexical-based population-level predictions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1167–1172, Brussels, Belgium. Association for Computational Linguistics.
  21. Transformer-based language models for mental health issues: A survey. Pattern Recognition Letters, 167:204–211.
  22. Tracking suicide risk factors through twitter in the us. Crisis.
  23. Mentalbert: Publicly available pretrained language models for mental healthcare. arXiv preprint arXiv:2110.15621.
  24. Using twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1):6.
  25. Suchitra Kataria and Vinod Ravindran. 2020. Electronic health records: a critical appraisal of strengths and limitations. Journal of the Royal College of Physicians of Edinburgh, 50(3):262–268.
  26. Analyzing us tweets for stigma against people experiencing homelessness. Stigma and Health, 8(2):187.
  27. Toward using twitter for prep-related interventions: An automated natural language processing pipeline for identifying gay or bisexual men in the united states. JMIR Public Health and Surveillance, 8(4):e32405.
  28. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  29. Api-based social media collecting as a form of web archiving. International Journal on Digital Libraries, 19(1):21–38.
  30. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  31. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  32. Steven M Manson. 2020. Ipums national historical geographic information system: version 15.0.
  33. Amaryllis Mavragani. 2020. Infodemiology and infoveillance: scoping review. Journal of medical internet research, 22(4):e16206.
  34. Mary Meeker and Liang Wu. 2018. Internet trends 2018.
  35. Nir Menachemi and Taleah H Collum. 2011. Benefits and drawbacks of electronic health record systems. Risk management and healthcare policy, pages 47–55.
  36. Benchmarking for public health surveillance tasks on social media with a domain-specific pretrained language model. arXiv preprint arXiv:2204.04521.
  37. Bertweet: A pre-trained language model for english tweets. arXiv preprint arXiv:2005.10200.
  38. Leveraging geotagged twitter data to examine neighborhood happiness, diet, and physical activity. Applied Geography, 73:77–88.
  39. Building a national neighborhood dataset from geotagged twitter data for indicators of happiness, diet, and physical activity. JMIR public health and surveillance, 2(2):e5869.
  40. Geotagged us tweets as predictors of county-level health outcomes, 2015–2016. American journal of public health, 107(11):1776–1782.
  41. Social media indicators of the food environment and state health outcomes. Public health, 148:120–128.
  42. OpenAI. 2021. OpenAI GPT-3.5 Documentation.
  43. Michael Paul and Mark Dredze. 2011. You are what you tweet: Analyzing twitter for public health. In Proceedings of the international AAAI conference on web and social media, volume 5-1, pages 265–272.
  44. Worldwide influenza surveillance through twitter. In AAAI workshop: WWW and public health intelligence.
  45. Surveillance of communicable diseases using social media: A systematic review. PLoS One, 18(2):e0282101.
  46. Food insecurity and mental health: a systematic review and meta-analysis. Public health nutrition, 23(10):1778–1790.
  47. The psychology of job loss: using social media data to characterize and predict unemployment. In Proceedings of the 8th ACM Conference on Web Science, pages 223–232.
  48. PyTorch. 2023. Adamw.
  49. Semeval-2017 task 4: Sentiment analysis in twitter. arXiv preprint arXiv:1912.00741.
  50. Characterizing geographic variation in well-being using tweets. In Proceedings of the International AAAI Conference on Web and Social Media, volume 7-1, pages 583–591.
  51. Scipy. 2023. Pearson r.
  52. Digital public health surveillance: a systematic scoping review. NPJ digital medicine, 4(1):41.
  53. Infectious disease surveillance in the big data era: towards faster and locally relevant systems. The Journal of infectious diseases, 214(suppl_4):S380–S385.
  54. Docnow/twarc: v2.14.0.
  55. Twitter mining for fine-grained syndromic surveillance. Artificial intelligence in medicine, 61(3):153–163.
  56. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272.
  57. Regional influenza prediction with sampling twitter data and pde model. International journal of environmental research and public health, 17(3):678.
  58. Wikipedia. 2023a. List of regions of the united states.
  59. Wikipedia. 2023b. Pearson correlation coefficient.
  60. Wikipedia contributors. 2023. County statistics of the United States.
  61. Identifying depressive symptoms from tweets: Figurative language enabled multitask learning framework. arXiv preprint arXiv:2011.06149.
  62. An intelligent early warning system of analyzing twitter data using machine learning on covid-19 surveillance in the us. Expert Systems with Applications, 198:116882.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets