Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach (2004.11695v1)

Published 24 Apr 2020 in cs.IR and cs.CL

Abstract: Internet forums and public social media, such as online healthcare forums, provide a convenient channel for users (people/patients) concerned about health issues to discuss and share information with each other. In late December 2019, an outbreak of a novel coronavirus (infection from which results in the disease named COVID-19) was reported, and, due to the rapid spread of the virus in other parts of the world, the World Health Organization declared a state of emergency. In this paper, we used automated extraction of COVID-19 related discussions from social media and a natural language process (NLP) method based on topic modeling to uncover various issues related to COVID-19 from public opinions. Moreover, we also investigate how to use LSTM recurrent neural network for sentiment classification of COVID-19 comments. Our findings shed light on the importance of using public opinions and suitable computational techniques to understand issues surrounding COVID-19 and to guide related decision-making.

Authors (4)

Hamed Jelodar (9 papers)
Yongli Wang (7 papers)
Rita Orji (6 papers)
Hucheng Huang (1 paper)

Citations (293)

View on Semantic Scholar

Summary

Deep Sentiment Classification and Topic Discovery on COVID-19 Online Discussions

The paper by Jelodar et al. presents an analytical framework that harnesses NLP and deep learning techniques to explore and understand public opinion concerning COVID-19 through social media discussions, specifically those available on Reddit. This research is underscored by the emerging need to systematically analyze the torrent of information available online to glean actionable insights about sentiment related to the pandemic.

Methodology Overview

The authors introduce a comprehensive methodological approach that combines NLP, sentiment analysis, and deep learning. The methodology involves several steps, starting with data collection from Reddit, focusing on COVID-19-related subreddits. Initially, a data pre-processing pipeline is employed to sanitize the text by removing noise and stop-words. This is followed by the semantic extraction via Latent Dirichlet Allocation (LDA) for topic modeling, enabling the identification of key topics and themes emerging in online discussions.

Subsequently, a Long Short-Term Memory (LSTM) recurrent neural network model is proposed for sentiment classification — a choice motivated by LSTM's capacity to learn long-term dependencies critical for understanding semantic and sentiment nuances in text data. The framework distinguishes itself by integrating deep learning capabilities, facilitating the breakdown of sentiment into fine-grained categories — very positive, positive, neutral, negative, and very negative.

Results and Observations

The data analyzed consisted of 563,079 comments from Reddit, collected over a two-month period. By implementing the LDA model, the paper identifies prominent topics, such as public sentiment towards healthcare, governmental policies, societal impacts of the pandemic, and individual experiences or concerns regarding COVID-19. The topic modeling results are further bolstered by sentiment analysis, revealing that discussions frequently exhibit sentiment polarities—with a noted prevalence of negative sentiment in contexts such as infection rates and health service capacity.

The novel application of LSTM models in this context is highlighted by the reported classification accuracy of 81.15%. This metric signifies a notable improvement over traditional machine learning approaches, underscoring the model's effectiveness in sentiment classification tasks.

Implications and Future Research

The results have dual implications. Practically, the framework can serve as a decision-support tool for policymakers and healthcare providers by deriving public sentiment and topic prevalence directly from unstructured online data. Theoretically, this paper expands the application of NLP and deep learning within public health informatics, showcasing how computational techniques can interface with social media data to monitor public perception and guide responsive action.

Future research can extend this framework to other social media platforms, incorporate multilingual dataset analysis, and refine the model's predictive accuracy through advanced deep learning architectures. Exploring hybrid models, such as integrating fuzzy-logic with deep learning, might further improve the sentiment and opinion classification.

Conclusion

This paper delivers a strong methodological contribution to the domain of sentiment analysis on pandemic-related discussions. By leveraging the volume and diversity of online discussions, coupled with sophisticated analytical techniques, it provides a useful avenue for extracting valuable insights into public sentiment and experiences during the COVID-19 pandemic. The interdisciplinary approach lays the groundwork for ongoing enhancements in the field of real-time sentiment analysis pertinent to public health crises.

Related Papers

Find Related Papers