Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Public Dataset Tracking Social Media Discourse about the 2024 U.S. Presidential Election on Twitter/X (2411.00376v1)

Published 1 Nov 2024 in cs.SI

Abstract: In this paper, we introduce the first release of a large-scale dataset capturing discourse on $\mathbb{X}$ (a.k.a., Twitter) related to the upcoming 2024 U.S. Presidential Election. Our dataset comprises 22 million publicly available posts on X.com, collected from May 1, 2024, to July 31, 2024, using a custom-built scraper, which we describe in detail. By employing targeted keywords linked to key political figures, events, and emerging issues, we aligned data collection with the election cycle to capture evolving public sentiment and the dynamics of political engagement on social media. This dataset offers researchers a robust foundation to investigate critical questions about the influence of social media in shaping political discourse, the propagation of election-related narratives, and the spread of misinformation. We also present a preliminary analysis that highlights prominent hashtags and keywords within the dataset, offering initial insights into the dominant themes and conversations occurring in the lead-up to the election. Our dataset is available at: url{https://github.com/sinking8/usc-x-24-us-election

Definition Search Book Streamline Icon: https://streamlinehq.com
References (7)
  1. Unearthing a Billion Telegram Posts about the 2024 U.S. Presidential Election: Development of a Public Dataset. Technical Report. HUMANS Lab – Working Paper No. 2024.5. https://arxiv.org/abs/2410.23638.
  2. Exposing Cross-Platform Coordinated Inauthentic Activity in the Run-Up to the 2024 U.S. Election. Technical Report. HUMANS Lab – Working Paper No. 2024.7. https://arxiv.org/abs/2410.22716.
  3. Emilio Ferrara. 2024a. Charting the Landscape of Nefarious Uses of Generative Artificial Intelligence for Online Election Interference. Technical Report. HUMANS Lab – Working Paper No. 2024.1. https://arxiv.org/abs/2406.01862.
  4. Emilio Ferrara. 2024b. What Are The Risks of Living in a GenAI Synthetic Reality? Technical Report. HUMANS Lab – Working Paper No. 2024.2. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4883399.
  5. Uncovering Coordinated Cross-Platform Information Operations Threatening the Integrity of the 2024 US Presidential Election Online Discussion. Technical Report. HUMANS Lab – Working Paper No. 2024.4. https://arxiv.org/abs/2409.15402.
  6. Tracking the 2024 US Presidential Election Chatter on Tiktok: A Public Multimodal Dataset. Technical Report. HUMANS Lab – Working Paper No. 2024.3. https://arxiv.org/abs/2407.01471.
  7. Emily Chen and Emilio Ferrara. 2023. Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia. In Proceedings of the 17th International AAAI Conference on Web and Social Media, 1006–1013.
Citations (7)

Summary

  • The paper introduces a large-scale dataset capturing over 22 million X.com posts during the 2024 U.S. Presidential Election, offering comprehensive insights into digital political narratives.
  • It details a custom scraping methodology with modular query structures and dynamic keyword filters to systematically collect diverse political discourse.
  • Preliminary analysis highlights dominant political hashtags and cross-platform media influence, informing strategies for election integrity and misinformation mitigation.

Analysis of the 2024 Election Integrity Initiative Dataset

The 2024 Election Integrity Initiative introduces a substantial dataset capturing discourse from X.com (formerly Twitter) pertinent to the 2024 U.S. Presidential Election. The dataset was compiled through a carefully designed scraping mechanism, the X-Scraper Engine, which methodically collected over 22 million posts from May 1, 2024, to July 31, 2024. This marks the deployment of a large-scale data collection strategy aiming to provide pivotal insights into social media narratives and their potential influence on political discourse.

Data Collection and Methodology

The authors devised an intricate framework for data collection, utilizing a custom scraping tool tailored to extract detailed post information and interactions from X.com. The scraper, operating at the browser interface level, collected data based on a dynamic set of keywords related to key political figures, parties, and emerging topics. This approach was specifically chosen to align with the episodic nature of electoral events and the evolving issues characterizing the 2024 election cycle.

Key to this methodology was the employment of a modular query structure akin to the X.com API, allowing the incorporation of filters for specific users, keywords, and post types. The setup ensured the acquisition of diverse content relevant to the political narrative and public engagement surrounding the election. Despite interface restrictions potentially limiting data volume, continuous human oversight ensured the comprehensiveness of the dataset.

Preliminary Analysis

In their preliminary analysis, the authors examined the prominence of specific keywords and hashtags, revealing insights into the dominant topics of discourse. The most frequently occurring keywords included prominent political figures such as "Biden" and "Trump," supplemented by ideologically significant terms like "MAGA." These observations emphasize the centrality of major candidates and ideological themes within the electoral conversation.

Furthermore, a review of the top trending hashtags elucidated the public's focus on significant campaign strategies and identity politics. Hashtags like "#maga," "#trump2024," and "#bidenharris2024" suggest sustained grassroots engagement and the amplification of campaign messages via digital platforms.

In terms of media engagement, the dataset highlighted the predominance of YouTube and other engaged news sites as vector channels for disseminating political content. This emphasizes the cross-platform nature of digital political discourse, with content shared on X.com often referencing broader digital media experiences.

Implications and Future Directions

The work provides a foundational tool for the examination of the interactions between social media narratives and political processes. The high frequency of specific keywords and hashtags offers researchers an opportunity to dissect the influence of digital discourse on public sentiment and campaign dynamics.

The initiative also has practical implications for election integrity and the mitigation of misinformation. Analyzing the dataset could inform strategies to anticipate and counteract misinformation, potentially safeguarding democratic processes against manipulative narratives.

Future exploration with the dataset could involve longitudinal studies across the electoral timeline, along with integrating cross-platform interaction metrics to provide a holistic view of digital engagement in election contexts. The dataset will further enable the analysis of verified users' engagement versus potential bot activity, contributing to a deeper understanding of authentic versus inauthentic participation in digital political conversations.

Although the scope is currently limited by data captured only via the X.com user interface, such a dataset represents a salient catalyst for advancing studies in political discourse analysis, informing both scholarly and practical endeavours concerning digital socio-political dynamics.

Github Logo Streamline Icon: https://streamlinehq.com