#Election2020: The First Public Twitter Dataset on the 2020 US Presidential Election (2010.00600v1)

Published 1 Oct 2020 in cs.SI

Abstract: The integrity of democratic political discourse is at the core to guarantee free and fair elections. With social media often dictating the tones and trends of politics-related discussion, it is of paramount important to be able to study online chatter, especially in the run up to important voting events, like in the case of the upcoming November 3, 2020 U.S. Presidential Election. Limited access to social media data is often the first barrier to impede, hinder, or slow down progress, and ultimately our understanding of online political discourse. To mitigate this issue and try to empower the Computational Social Science research community, we decided to publicly release a massive-scale, longitudinal dataset of U.S. politics- and election-related tweets. This multilingual dataset that we have been collecting for over one year encompasses hundreds of millions of tweets and tracks all salient U.S. politics trends, actors, and events between 2019 and 2020. It predates and spans the whole period of Republican and Democratic primaries, with real-time tracking of all presidential contenders of both sides of the isle. After that, it focuses on presidential and vice-presidential candidates. Our dataset release is curated, documented and will be constantly updated on a weekly-basis, until the November 3, 2020 election and beyond. We hope that the academic community, computational journalists, and research practitioners alike will all take advantage of our dataset to study relevant scientific and social issues, including problems like misinformation, information manipulation, interference, and distortion of online political discourse that have been prevalent in the context of recent election events in the United States and worldwide. Our dataset is available at: https://github.com/echen102/us-pres-elections-2020

PDF Abstract

Public Twitter Dataset on the 2020 U.S. Presidential Election

This paper, authored by Emily Chen, Ashok Deb, and Emilio Ferrara, presents a comprehensive dataset of tweets related to the 2020 U.S. presidential election. The focus of this work is on facilitating research in computational social science by providing a large-scale, publicly accessible dataset of election-related Twitter activity. With social media increasingly influencing political discourse, this dataset aims to enable analyses of how online conversations evolve in the context of significant political events.

Dataset Overview

This dataset represents a substantial and longitudinal collection of Twitter data, spanning a period from May 2019 through the weeks following the 2020 election held on November 3, 2020. The dataset captures approximately one billion tweets, with an emphasis on real-time tracking of U.S. political discourse during the election cycle. The release includes tweets from crucial periods such as the Democratic and Republican primaries, and the general election campaign, providing insights into both the primary candidates and the main contenders for the presidency and vice-presidency.

Data Collection and Methodology

The authors utilized Twitter’s streaming API through the Tweepy library to systematically collect tweets. The paper tracks specific mentions of and follows accounts related to all candidates that participated in the 2020 U.S. presidential election. The collection effort was dynamic, adjusting to include new keywords related to evolving election issues, influential events, and key developments, such as candidate dropouts or shifts to virtual campaigning precipitated by the COVID-19 pandemic.

The dataset comprises in excess of 600 million tweets, representing over 4 terabytes of raw data. The initial release constitutes data collected from June 20, 2020, to September 6, 2020, totaling approximately 240 million tweets, and the authors plan to continue extending the dataset beyond election day. Due to Twitter’s Developer Agreement, only Tweet IDs can be shared, but the publication provides tools and guidelines for researchers to retrieve full tweet payloads, ensuring compliance with Twitter's data use policies.

Implications for Research and Future Directions

This dataset provides an invaluable resource for researchers studying various phenomena related to online political discourse. It provides an opportunity to analyze misinformation dynamics, potential interference in democratic processes, and the evolving nature of political engagement on social media. By offering detailed insights into hashtags, mentions, and keyword usage among different political affiliations, it positions researchers to explore partisan trends, conspiracy theories, and pandemic-related topics intertwined with election discussions.

Future analysis of this dataset can shed light on the implications of digital communications for democracy, offering both practical and theoretical insights. Researchers have the opportunity to examine the impact of social media on voter sentiment, polarization, and candidate strategies. The potential for interdisciplinary studies is significant, encompassing fields such as political science, communication, data science, and public policy.

In summary, the release of this election-related Twitter dataset is a significant contribution to computational social science. By unlocking access to a broad spectrum of data, it empowers researchers to advance the understanding of online political discourse and its influence on real-world electoral outcomes. This resource sets the stage for continued exploration of the interplay between social media platforms and democratic elections globally.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Emily Chen (16 papers)
Ashok Deb (8 papers)
Emilio Ferrara (197 papers)

Citations (66)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - echen102/us-pres-elections-2020: The repository contains a collection of tweets IDs associated with the 2020 U.S. Presidential Elections through 6 months post-inauguration. (129 stars)

Tweets

https://twitter.com/documentnow/status/1345867344552321038