COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies (2105.05134v2)

Published 11 May 2021 in cs.SI and cs.CY

Abstract: False claims about COVID-19 vaccines can undermine public trust in ongoing vaccination campaigns, thus posing a threat to global public health. Misinformation originating from various sources has been spreading online since the beginning of the COVID-19 pandemic. In this paper, we present a dataset of Twitter posts that exhibit a strong anti-vaccine stance. The dataset consists of two parts: a) a streaming keyword-centered data collection with more than 1.8 million tweets, and b) a historical account-level collection with more than 135 million tweets. The former leverages the Twitter streaming API to follow a set of specific vaccine-related keywords starting from mid-October 2020. The latter consists of all historical tweets of 70K accounts that were engaged in the active spreading of anti-vaccine narratives. We present descriptive analyses showing the volume of activity over time, geographical distributions, topics, news sources, and inferred account political leaning. This dataset can be used in studying anti-vaccine misinformation on social media and enable a better understanding of vaccine hesitancy. In compliance with Twitter's Terms of Service, our anonymized dataset is publicly available at: https://github.com/gmuric/avax-tweets-dataset

PDF Abstract

Analyzing COVID-19 Vaccine Hesitancy Through Social Media Data

The paper "COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies" by Muric, Wu, and Ferrara offers an extensive dataset aiming to illuminate the pervasive issue of vaccine hesitancy as expressed on Twitter. The dataset derives from two distinct collections: a streaming keyword-centered dataset and an account-level historical dataset. It serves as a comprehensive resource for studying anti-vaccine sentiment and misinformation propagation on social media, at a critical juncture during the COVID-19 pandemic.

The authors leverage the Twitter Streaming API to collect tweets containing predetermined anti-vaccine keywords, resulting in over 1.8 million tweets from 719,000 unique accounts between October 2020 and May 2021. These keywords reflect strong opposition to vaccines and are curated through an iterative snowball method. Additionally, they have accumulated a historical dataset of over 135 million tweets from approximately 70,000 accounts identified as spreading anti-vaccine narratives.

The analysis of these tweets focuses on several aspects: frequency and temporal trends of anti-vaccine discourse, geographic distribution, political leanings of the involved accounts, and the media sources that are predominantly shared. A notable finding is the substantial skew towards right-leaning media consumption among these accounts, consistent with previous studies linking political orientation and vaccine hesitancy. Furthermore, the inclusion of detailed lists, such as the top hashtags and websites, underscores the dynamic nature of these narratives and the role of low-credibility news sources in perpetuating misinformation.

The avax score introduced in this research quantifies the extent of anti-vaccination rhetoric within an account's content, distinguishing between original tweets and retweets. This measure can assist in identifying influential spreaders of misinformation and understanding how vaccine hesitancy manifests differently across communication behaviors on social media.

The geographic data analysis reveals that the majority of these conversations originate from English-speaking countries, predominantly the United States, with notable variance in activity across states. Such insights could guide targeted public health interventions and policy-making to address vaccine hesitancy more effectively.

The dataset and analysis presented by the authors provide a foundational tool for future research on COVID-19 vaccine misinformation on social media platforms. The accounts' historical data spanning several years offer rich prospects for temporal analysis of misinformation evolution. Interested researchers can access this publicly available dataset to explore various dimensions of vaccine hesitancy, its implications for public health, and the potential strategies to combat misinformation. In doing so, the community can further elucidate the complex interplay of sociocultural, political, and informational factors that underpin vaccine hesitancy in the digital age.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Goran Muric (15 papers)
Yusong Wu (15 papers)
Emilio Ferrara (197 papers)

Citations (234)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - gmuric/avax-tweets-dataset (49 stars)