Cardiovascular Disease Risk Prediction via Social Media (2309.13147v2)
Abstract: Researchers use Twitter and sentiment analysis to predict Cardiovascular Disease (CVD) risk. We developed a new dictionary of CVD-related keywords by analyzing emotions expressed in tweets. Tweets from eighteen US states, including the Appalachian region, were collected. Using the VADER model for sentiment analysis, users were classified as potentially at CVD risk. Machine Learning (ML) models were employed to classify individuals' CVD risk and applied to a CDC dataset with demographic information to make the comparison. Performance evaluation metrics such as Test Accuracy, Precision, Recall, F1 score, Mathew's Correlation Coefficient (MCC), and Cohen's Kappa (CK) score were considered. Results demonstrated that analyzing tweets' emotions surpassed the predictive power of demographic data alone, enabling the identification of individuals at potential risk of developing CVD. This research highlights the potential of NLP and ML techniques in using tweets to identify individuals with CVD risks, providing an alternative approach to traditional demographic information for public health monitoring.
- Alcohol and heart health: Separating fact from fiction | johns hopkins medicine. https://www.hopkinsmedicine.org/health/wellness-and-prevention/alcohol-and-heart-health-separating-fact-from-fiction, (Accessed on 12/12/2022)
- Appalachian states - appalachian regional commission. https://www.arc.gov/appalachian-states/, (Accessed on 12/12/2022)
- Glossary of words related to heart disease: Cardiac surgery program | columbia university department of surgery. https://columbiasurgery.org/heart/glossary, (Accessed on 12/12/2022)
- Know your risk for heart disease | cdc.gov. https://www.cdc.gov/heartdisease/risk_factors.htm, (Accessed on 12/12/2022)
- Sentimental analysis using vader. interpretation and classification of… | by aditya beri | towards data science. https://towardsdatascience.com/sentimental-analysis-using-vader-a3415fef7664, (Accessed on 12/12/2022)
- Smote for imbalanced classification with python - machinelearningmastery.com. https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/, (Accessed on 12/13/2022)
- Tweepy documentation — tweepy 4.12.1 documentation. https://docs.tweepy.org/en/stable/, (Accessed on 12/12/2022)
- U.s. chronic disease indicators (cdi) - catalog. https://catalog.data.gov/dataset/u-s-chronic-disease-indicators-cdi, (Accessed on 12/13/2022)
- Vader sentiment analysis | nlp sentiment analysis using vader. https://www.analyticsvidhya.com/blog/2021/06/vader-for-sentiment-analysis, (Accessed on 12/12/2022)