COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification (2005.10898v1)

Published 21 May 2020 in cs.IR and cs.SI

Abstract: Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential ML classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Authors (5)

Jim Samuel (13 papers)
G. G. Md. Nawaz Ali (7 papers)
Md. Mokhlesur Rahman (7 papers)
Ek Esawi (1 paper)
Yana Samuel (6 papers)

Citations (338)

View on Semantic Scholar

Summary

Analysis of COVID-19 Sentiment Through Machine Learning

The paper "COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification" provides a comprehensive exploration of public sentiment related to the COVID-19 pandemic using Twitter data and ML techniques. This work focuses on quantifying shifts in public emotions, particularly fear associated with the pandemic, and explores the efficacy of ML methods in sentiment classification of textual data, specifically Twitter tweets.

The authors employ R statistical software and its sentiment analysis packages to analyze COVID-19-specific tweets, providing descriptive textual analytics and data visualizations to reveal insights into the evolution of fear sentiment over time. The sentiment analysis leverages R packages like Syuzhet and sentimentr, using the NRC sentiment lexicon to classify and score tweets based on emotions such as fear, sadness, and anger.

Two primary ML methods are evaluated: Naïve Bayes and Logistic Regression. The paper presents a detailed comparison of these classifiers in terms of their accuracy in classifying tweets of varying lengths. For short tweets, Naïve Bayes demonstrates a high classification accuracy of 91%, while logistic regression records a reasonable accuracy of 74%. However, both methods exhibit decreased performance for longer tweets, highlighting limitations due to factors such as data scarcity and the unique properties of Twitter's unstructured textual data.

This research presents several key findings and implications:

Insight into Fear Sentiment: A significant increase in fear sentiment towards the end of March 2020 was observed, corresponding with the rapid escalation of COVID-19 cases in the United States. This progression was prominently featured in visualizations, such as the "Fear Curve," which tracks sentiment changes over time.
Machine Learning for Text Classification: The paper demonstrates the viability of employing ML methods to effectively classify public sentiment, offering potential applications in managing public perceptions during global crises. The paper suggests that short to medium-length tweets yield more accurate classification results, showing an inverse relationship between text length and classification accuracy.
Theoretical and Practical Implications: The findings underscore the benefits of integrating ML techniques in sentiment analysis for real-time monitoring of public emotions during crises. The research offers a pathway for future developments in AI by improving sentiment classification models through the incorporation of additional data sources and methods.

While the paper successfully achieves its goal of deriving insights from Twitter data using ML, it also acknowledges limitations such as the need for models with greater generalization potential across different datasets and lexicons. Future research could involve expanding these methods to other social media platforms and exploring hybrid classification approaches that incorporate multiple sentiment lexicons.

In conclusion, the paper provides a structured methodology for using ML in COVID-19-related sentiment analysis, setting a foundation for broader applications in public health monitoring and crisis management. By addressing these critical issues, the research contributes valuable insights into the interplay between public sentiment and information dissemination during a pandemic, paving the way for more informed policy decisions and public health strategies.

PDF Markdown

Related Papers

Find Related Papers