Public Twitter Dataset on the 2020 U.S. Presidential Election
This paper, authored by Emily Chen, Ashok Deb, and Emilio Ferrara, presents a comprehensive dataset of tweets related to the 2020 U.S. presidential election. The focus of this work is on facilitating research in computational social science by providing a large-scale, publicly accessible dataset of election-related Twitter activity. With social media increasingly influencing political discourse, this dataset aims to enable analyses of how online conversations evolve in the context of significant political events.
Dataset Overview
This dataset represents a substantial and longitudinal collection of Twitter data, spanning a period from May 2019 through the weeks following the 2020 election held on November 3, 2020. The dataset captures approximately one billion tweets, with an emphasis on real-time tracking of U.S. political discourse during the election cycle. The release includes tweets from crucial periods such as the Democratic and Republican primaries, and the general election campaign, providing insights into both the primary candidates and the main contenders for the presidency and vice-presidency.
Data Collection and Methodology
The authors utilized Twitter’s streaming API through the Tweepy library to systematically collect tweets. The paper tracks specific mentions of and follows accounts related to all candidates that participated in the 2020 U.S. presidential election. The collection effort was dynamic, adjusting to include new keywords related to evolving election issues, influential events, and key developments, such as candidate dropouts or shifts to virtual campaigning precipitated by the COVID-19 pandemic.
The dataset comprises in excess of 600 million tweets, representing over 4 terabytes of raw data. The initial release constitutes data collected from June 20, 2020, to September 6, 2020, totaling approximately 240 million tweets, and the authors plan to continue extending the dataset beyond election day. Due to Twitter’s Developer Agreement, only Tweet IDs can be shared, but the publication provides tools and guidelines for researchers to retrieve full tweet payloads, ensuring compliance with Twitter's data use policies.
Implications for Research and Future Directions
This dataset provides an invaluable resource for researchers studying various phenomena related to online political discourse. It provides an opportunity to analyze misinformation dynamics, potential interference in democratic processes, and the evolving nature of political engagement on social media. By offering detailed insights into hashtags, mentions, and keyword usage among different political affiliations, it positions researchers to explore partisan trends, conspiracy theories, and pandemic-related topics intertwined with election discussions.
Future analysis of this dataset can shed light on the implications of digital communications for democracy, offering both practical and theoretical insights. Researchers have the opportunity to examine the impact of social media on voter sentiment, polarization, and candidate strategies. The potential for interdisciplinary studies is significant, encompassing fields such as political science, communication, data science, and public policy.
In summary, the release of this election-related Twitter dataset is a significant contribution to computational social science. By unlocking access to a broad spectrum of data, it empowers researchers to advance the understanding of online political discourse and its influence on real-world electoral outcomes. This resource sets the stage for continued exploration of the interplay between social media platforms and democratic elections globally.