Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media

Published 24 Oct 2016 in cs.CL, cs.IR, and cs.SI | (1610.07363v1)

Abstract: Breaking news leads to situations of fast-paced reporting in social media, producing all kinds of updates related to news stories, albeit with the caveat that some of those early updates tend to be rumours, i.e., information with an unverified status at the time of posting. Flagging information that is unverified can be helpful to avoid the spread of information that may turn out to be false. Detection of rumours can also feed a rumour tracking system that ultimately determines their veracity. In this paper we introduce a novel approach to rumour detection that learns from the sequential dynamics of reporting during breaking news in social media to detect rumours in new stories. Using Twitter datasets collected during five breaking news stories, we experiment with Conditional Random Fields as a sequential classifier that leverages context learnt during an event for rumour detection, which we compare with the state-of-the-art rumour detection system as well as other baselines. In contrast to existing work, our classifier does not need to observe tweets querying a piece of information to deem it a rumour, but instead we detect rumours from the tweet alone by exploiting context learnt during the event. Our classifier achieves competitive performance, beating the state-of-the-art classifier that relies on querying tweets with improved precision and recall, as well as outperforming our best baseline with nearly 40% improvement in terms of F1 score. The scale and diversity of our experiments reinforces the generalisability of our classifier.

Abstract PDF Upgrade to Chat

Citations (168)

View on Semantic Scholar

Summary

The paper's main contribution is a CRF-based sequential classifier that enhances rumor detection during breaking news, achieving nearly a 40% F1 score improvement.
The methodology employs a bottom-up data collection strategy with journalist collaboration to annotate dynamic tweet contexts from real events.
The findings underscore that incorporating temporal and contextual dynamics in automated systems significantly reduces manual intervention and curbs misinformation.

The paper by Zubiaga et al. introduces a novel approach to rumor detection on social media during periods of breaking news, leveraging the sequential dynamics of reporting. Traditional methods often rely heavily on manual intervention, querying tweets, or curated lists of expressions—approaches that can be restrictive in fast-evolving news contexts where new, previously unseen rumors emerge. In contrast, this study presents a Conditional Random Fields (CRF) approach to automatically classify tweets as rumors or non-rumors by considering the context in which the information arises during an event.

Methodology

The researchers employed a bottom-up data collection strategy, collaborating closely with journalists to identify and annotate tweets from breaking news stories. Using datasets from five distinct events, the study primarily aimed to detect unverified information without relying on subsequent querying tweets—a method characteristic of current state-of-the-art systems.

The study deployed CRF as a sequential classifier, contrasting its capabilities against non-sequential classifiers such as Maximum Entropy, Naive Bayes, Support Vector Machines (SVM), and Random Forests. Moreover, it evaluated the CRF model against the standard method using manually curated regular expressions as proposed by Zhao et al., contextualizing the superiority of incorporating sequential dynamics over fixed expressions.

Results and Analysis

The CRF classifier showed significant improvements, achieving nearly a 40% increase in F1 score compared to baseline models. This marked advancement underscores the importance of sequential context, allowing for a more nuanced understanding of the dynamics in rumor creation and dissemination over time and across varying events.

The paper further explores the consistency of the CRF's performance across different stages of an event's timeline, certifying its robustness beyond initial reporting phases—a characteristic imperative for effective real-time rumor detection.

Implications

The findings have broad implications for media practitioners, government agencies, and developers of social media monitoring tools. By automating the detection process and reducing dependence on time-consuming manual interventions, this approach aids in promptly identifying potential misinformation, thereby curbing its influence and reducing social risk factors associated with false reporting.

On a theoretical level, this work sheds light on the importance of incorporating temporal and contextual dynamics into machine learning models and opens pathways for further research into real-time analysis and response systems for breaking news.

Future Developments

This study suggests that further exploration into leveraging sequential classifiers across other forms of media and event domains could refine rumor detection precision. Additionally, expanding word vector models and enriching feature sets could enhance classifier accuracy even further.

Future exploration might also include developing systems capable of handling less prominent or quickly retracting rumors, which currently pose challenges due to low retweet thresholds. This research sets a vital precedent in the ongoing development of automated systems aiming to safeguard against the dissemination of false information in digital environments.

Markdown