Analyzing COVID-19 Vaccine Hesitancy Through Social Media Data
The paper "COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies" by Muric, Wu, and Ferrara offers an extensive dataset aiming to illuminate the pervasive issue of vaccine hesitancy as expressed on Twitter. The dataset derives from two distinct collections: a streaming keyword-centered dataset and an account-level historical dataset. It serves as a comprehensive resource for studying anti-vaccine sentiment and misinformation propagation on social media, at a critical juncture during the COVID-19 pandemic.
The authors leverage the Twitter Streaming API to collect tweets containing predetermined anti-vaccine keywords, resulting in over 1.8 million tweets from 719,000 unique accounts between October 2020 and May 2021. These keywords reflect strong opposition to vaccines and are curated through an iterative snowball method. Additionally, they have accumulated a historical dataset of over 135 million tweets from approximately 70,000 accounts identified as spreading anti-vaccine narratives.
The analysis of these tweets focuses on several aspects: frequency and temporal trends of anti-vaccine discourse, geographic distribution, political leanings of the involved accounts, and the media sources that are predominantly shared. A notable finding is the substantial skew towards right-leaning media consumption among these accounts, consistent with previous studies linking political orientation and vaccine hesitancy. Furthermore, the inclusion of detailed lists, such as the top hashtags and websites, underscores the dynamic nature of these narratives and the role of low-credibility news sources in perpetuating misinformation.
The avax score introduced in this research quantifies the extent of anti-vaccination rhetoric within an account's content, distinguishing between original tweets and retweets. This measure can assist in identifying influential spreaders of misinformation and understanding how vaccine hesitancy manifests differently across communication behaviors on social media.
The geographic data analysis reveals that the majority of these conversations originate from English-speaking countries, predominantly the United States, with notable variance in activity across states. Such insights could guide targeted public health interventions and policy-making to address vaccine hesitancy more effectively.
The dataset and analysis presented by the authors provide a foundational tool for future research on COVID-19 vaccine misinformation on social media platforms. The accounts' historical data spanning several years offer rich prospects for temporal analysis of misinformation evolution. Interested researchers can access this publicly available dataset to explore various dimensions of vaccine hesitancy, its implications for public health, and the potential strategies to combat misinformation. In doing so, the community can further elucidate the complex interplay of sociocultural, political, and informational factors that underpin vaccine hesitancy in the digital age.