A simple but tough-to-beat baseline for the Fake News Challenge stance detection task (1707.03264v2)

Published 11 Jul 2017 in cs.CL

Abstract: Identifying public misinformation is a complicated and challenging task. An important part of checking the veracity of a specific claim is to evaluate the stance different news sources take towards the assertion. Automatic stance evaluation, i.e. stance detection, would arguably facilitate the process of fact checking. In this paper, we present our stance detection system which claimed third place in Stage 1 of the Fake News Challenge. Despite our straightforward approach, our system performs at a competitive level with the complex ensembles of the top two winning teams. We therefore propose our system as the 'simple but tough-to-beat baseline' for the Fake News Challenge stance detection task.

Authors (4)

Benjamin Riedel (1 paper)
Isabelle Augenstein (131 papers)
Georgios P. Spithourakis (8 papers)
Sebastian Riedel (140 papers)

Citations (229)

View on Semantic Scholar

Summary

A Baseline Approach to Stance Detection in the Fake News Challenge

The paper "A simple but tough-to-beat baseline for the Fake News Challenge stance detection task" presents an effective methodology for stance detection, a crucial subtask in the broader context of automatic fact-checking. The authors, from University College London and the University of Copenhagen, outline a straightforward yet robust system that achieved third place in the Fake News Challenge (FNC-1). This competition mandates participants to design systems capable of determining the stance of a news article body in relation to its headline, classifying it as 'agree', 'disagree', 'discuss', or 'unrelated'.

System Overview

The proposed system utilizes lexical and similarity features processed by a multi-layer perceptron (MLP) with a singular hidden layer. The model leverages basic but substantial text representations, including bag-of-words (BOW) embeddings such as Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF). Specifically, the feature set comprises the TF vector representations of both the headline and the body and the cosine similarity between their TF-IDF vectors. The simplicity of this approach, combined with its competitive performance relative to more intricate ensemble models developed by other participants, establishes it as a reference baseline for further research in stance detection tasks.

Training and Performance

The training process employed cross-entropy as the loss function, combined with regularization techniques like dropout and $\ell_{2}$ regularization, enhanced by the Adam optimizer for effective model training. As measured by the FNC-1 evaluation metrics, the system attained a score of 81.72%, positioning it closely behind the leading ensembles that utilized deeper networks with enriched feature sets. Its particularly high accuracy of 96.55% in classifying 'related' versus 'unrelated' stance exemplifies its utility, though it showed limitations in accurately identifying 'agree' and 'disagree' stances.

Implications and Future Directions

The research implies practical approaches for stance detection, pivotal in streamlining automated content verification processes. The simplicity of the model architecture suggests a robust starting point for further development, emphasizing transparency and ease of interpretability. Future work may involve exploring refined feature sets, integrating more sophisticated text representations, and augmenting training datasets to improve 'agree' and 'disagree' stance detection. Moreover, the underlying methodology offers potential adaptability across various domains where text classification and stance evaluation are applicable.

In conclusion, this paper provides a valuable contribution in presenting a simple yet effective baseline for stance detection, opening avenues for subsequent enhancements and adaptations in automated misinformation detection systems. The system stands as a testament to the viability of straightforward machine learning architectures in achieving competitive performance within complex NLP tasks.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - uclnlp/fakenewschallenge: UCL Machine Reading - FNC-1 Submission (166 stars)