140 Characters to Victory?: Using Twitter to Predict the UK 2015 General Election

Published 6 May 2015 in cs.CY, cs.SI, and physics.soc-ph | (1505.01511v1)

Abstract: The election forecasting 'industry' is a growing one, both in the volume of scholars producing forecasts and methodological diversity. In recent years a new approach has emerged that relies on social media and particularly Twitter data to predict election outcomes. While some studies have shown the method to hold a surprising degree of accuracy there has been criticism over the lack of consistency and clarity in the methods used, along with inevitable problems of population bias. In this paper we set out a 'baseline' model for using Twitter as an election forecasting tool that we then apply to the UK 2015 General Election. The paper builds on existing literature by extending the use of Twitter as a forecasting tool to the UK context and identifying its limitations, particularly with regard to its application in a multi-party environment with geographic concentration of power for minor parties.

Abstract PDF Upgrade to Chat

Citations (217)

View on Semantic Scholar

Summary

The paper’s main contribution is extending Twitter-based forecasting to the UK’s multi-party election by integrating sentiment analysis with seat prediction methods.
It employs a methodology combining tweet collection, sentiment scoring, human annotation, and historical swing data to overcome classification challenges.
Key findings indicate a forecasted hung parliament with Labour gaining most seats while underestimating regional parties like the SNP, highlighting model limitations.

Predicting the UK 2015 General Election Using Twitter Data

The paper explores the use of Twitter as an election forecasting tool, specifically targeting the UK 2015 General Election. It seeks to establish a baseline model that addresses various criticisms and methodological challenges previously identified in the literature. The authors propose a model that incorporates sentiment analysis and adjusts the forecasting approach to account for the distribution of parliamentary power, particularly in the context of a multi-party system.

The primary contribution of this work is the extension of Twitter-based forecasting to the UK political environment, characterized by its multi-party system and geographically concentrated parties like the SNP. Previous studies on Twitter-based election forecasting, such as those by Tumasjan et al. (2010) and DiGrazia et al. (2013), have primarily focused on binary party systems or less complex electoral contexts, often yielding mixed results. This paper aims to improve upon these approaches by incorporating sentiment analysis and seat-based predictions.

Methodological Approach

The study employs Twitter data collected through the Twitter streaming API, focusing on tweets containing party and leader names. A sentiment analysis tool is then applied to classify the sentiment of each tweet, with the authors opting to exclude tweets with extreme negative sentiment scores. The decision to sum sentiment scores, rather than simply counting mentions, allows the model to capture the magnitude of positivity or negativity towards a party or leader.

Significant effort is made to address the issue of false positives in tweet classification, especially concerning terms like "Labour" and "Greens," which have broader usage beyond UK politics. The researchers apply human annotation to a sample of tweets to determine the relevant proportion related to the intended political context, thereby adjusting their positivity scores.

For the projection of election results, the paper introduces a conversion from calculated vote share to seat predictions, using historical swing calculations from the 2010 election as a reference point. This step is vital for addressing the spatial element of UK elections, where regional support significantly influences outcomes.

Key Findings

The forecasting model suggests a likely outcome of a hung parliament, with Labour gaining the most seats but significantly underestimating SNP seats. This underestimation highlights the model's limitations in handling regionalist parties without geo-location data, a known challenge in using national-level data to predict regionally skewed outcomes.

The study succeeds in meeting several key standards for a minimal acceptable forecasting model, as outlined in prior works by Gayo-Avello and others. By providing a genuine forecast—issued prior to the election—incorporating tweet sentiment, and adjusting for seat distribution, the paper addresses salient methodological gaps identified in the existing literature.

Implications and Future Directions

The findings underscore the utility of Twitter data as a supplementary tool for election forecasting but also stress the need for improvements, particularly in handling demographic and geographic biases. Future iterations of this model should consider integrating geo-location capabilities and correcting for demographic biases to improve prediction accuracy, especially for parties with strong regional concentrations.

The research signifies progress in understanding the role of social media data in electoral predictions and poses broader implications for its use in other multi-party electoral contexts. As social media platforms continue to evolve, the methodologies for leveraging their data for analytical purposes must also advance, with considerations for privacy, data integrity, and representativeness at the forefront of future research endeavors.

Markdown